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Preface 


Teaching risk management in finance 


This book is a handbook for students of Master’s in finance, who want to learn risk 
management. It corresponds to the lecture notes of my course “Risk Management & Finan- 
cial Regulation” at the University of Paris Saclay. This title highlights the role of financial 
regulation. Indeed, it appears that financial regulation is an important component to under- 
stand the practice of risk management in finance. This is particularly true in the banking 
sector, but it is also valid in other financial sectors. At first sight, it may be curious to teach 
for example the standards developed by the Basel Committee. They are freely available 
and any student may consult them. However, the regulation is so complex and the docu- 
mentation produced is so abundant that students (but also professionals) may be lost when 
they want to have an overview on a specific topic or seek particular information. Therefore, 
I consider that the primary role of a course in risk management is to understand in gen- 
eral terms the financial regulation and be able to navigate between the various regulatory 
standards. This is all the more important that financial regulation is everywhere since the 
2008 Global Financial Crisis (GFC). Today, most of the resources of a risk management 
department within a bank are dedicated to the regulation, and this is also the case of big 
projects. Understanding risk management requires them to know the regulation. Neverthe- 
less, teaching risk management cannot be limited to the study of the regulation. Another 
important component of risk management is risk measurement. This requires having a sta- 
tistical model for calculating the probability of a loss. A brief review shows that there are 
many risk models from the simplest to the most complicated because there are many types 
of risk and many risk factors. Moreover, the modeling of risk factors is not an easy task and 
requires making assumptions, and the complexity of a model can increase with the likeli- 
hood of these assumptions!. Therefore, the second role of a course in risk management is to 
distinguish between the mathematical models of risk measurement and study those that are 
actually used by professionals. From an academic point of view, some models may appear 
to be outdated or old-fashioned. However, they can continue to be used by risk managers 
for many reasons: more robust, easier to calibrate, etc. For example, the most important 
risk measurement model is certainly the historical value-at-risk. This is why it is important 
to choose the right models to study. A handbook cannot be a comprehensive catalogue of 
risk management methods. But it must present the most frequently used models and the 
essential mathematical tools in order to help the Master student when he will be faced with 
reality and situations that will require a more complex modeling. 


1 However, a complex model does not mean that the assumptions are more realistic. 
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xxii Preface 


About this book 


These lecture notes are divided into two parts. After an introductory chapter presenting 
the main concepts of risk management and an overview of the financial regulation, the first 
part is dedicated to the risk management in the banking sector and is made up of seven 
chapters: market risk, credit risk, counterparty credit risk, operational risk, liquidity risk, 
asset liability management risk and systemic risk. I begin with the market risk, because 
it allows to introduce naturally the concept of risk factor, describe what a risk measure 
is and define the risk allocation approach. For each chapter, I present the corresponding 
regulatory framework and the risk management tools. I continue with five chapters that are 
mainly focused on the banking sector. However, even if these six chapters are dedicated to 
the banking sector, these materials also establish the basics of risk management in other 
financial sectors. They are the common language that is shared by all risk managers in 
finance. This first part ends with a eighth chapter on systemic risk and shadow banking 
system. In particular, this chapter supplements the introductory chapter and shows that 
the risk regulation culture has affected the other non-banking financial sectors such as asset 
management, insurance, pension funds and market infrastructure. The second part of these 
lecture notes develops the mathematical and statistical tools used in risk management. It 
contains seven chapters: model risk of exotic derivatives, statistical inference and model 
estimation, copula functions, extreme value theory, Monte Carlo simulation, stress testing 
methods and credit scoring models. Each chapter of these lecture notes is extensively il- 
lustrated by numerical examples and contains also tutorial exercises. Finally, a technical 
appendix completes the lecture notes and contains some important elements on numerical 
analysis. 

The writing of these lecture notes started in April 2015 and is the result of twenty years 
of academic courses. When I began to teach risk management, a large part of my course 
was dedicated to statistical tools. Over the years, financial regulation became however in- 
creasingly important. I am convinced that risk management is now mainly driven by the 
regulation, not by the progress of the mathematical models. The writing of this book has 
benefited from the existing materials of my French book called “La Gestion des Risques 
Financiers”. Nevertheless, the structure of the two books is different, because my previous 
book only concerned market, credit and operational risk before Basel III. Some years ago, I 
decided to extend the course to other financial sectors, especially insurance, asset manage- 
ment and market infrastructure. In fact, it appears that the quantitative methods of risk 
management are the same across the different financial areas even if each sector presents its 
particular aspects. But they differ mainly by the regulation, not by the mathematical tools. 
The knowledge of the different regulations is not an easy task for students. However, it is 
necessary if one would like to understand what the role of risk management is in financial 
institutions in the present-day world. Moreover, reducing the practice of risk management 
to the assimilation of the regulation rules is not sufficient. The sound understanding of the 
financial products and the mathematical models are essential to know where the risks are. 
This is why some parts of this book can be difficult because risk management is today com- 
plex in finance. A companion book, which contains the solutions of the tutorial exercises, is 
available in order to facilitate learning and knowledge assimilation at the following internet 
web page: 


http: //www.thierry-roncalli.com/RiskManagementBook. html 
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Symbol Description 


x 


B(w |) 


Arithmetic multiplication 

Scalar, vector and matrix multipli- 
cation 

Convolution 

Hadamard product: (toy); = 
Tiyi 

Kronecker product A & B 
Cardinality of the set € 
Concordance ordering 

Inner product of x and z’ 

Vector of ones 

The indicator function is equal to 
1 if A is true, 0 otherwise 

The characteristic function is 
equal to 1 if x € A, 0 otherwise 
Vector of zeros 

Matrix A with entry A; j in row 7 
and column j 

Inverse of the matrix A 

Square root of the matrix A 
Transpose of the matrix A 
Moore-Penrose pseudo-inverse of 
the matrix A 

Vector of weights (bj,... 
the benchmark b 

Price of the zero-coupon bond at 
time t for the maturity T 
Alternative form of B; (T) 
Bernoulli distribution with pa- 
rameter p 


bn) for 


Binomial distribution with param- 
eter n and p 

Beta of asset 2 with respect to 
portfolio w 

Another notation for the symbol 
Bi 

Beta of portfolio w when the 
benchmark is b 


B (x; a, B) Incomplete 


C 


C (or p) 
€ 


Beta distribution with parameter 


a and 8 
Beta function defined as 
Jea- t) dt 

beta function 


{ote hay dt 

Coupon rate of the CDS premium 
leg 

Correlation matrix 

OTC contract 


C (ui, u2) Copula function 


c 
C (i) 


Z 


Set of copula functions 
Mapping function 


C (ui, u2) Survival copula 


Fréchet lower bound copula 
Product copula 

Fréchet upper bound copula 
Price of the call option at time t 
Coupon paid at time tm 
Constant correlation matrix of di- 
mension n with pij = p 

Current exposure at time to 
Covariance of the random vector 
X 

Chi-squared distribution with v 
degrees of freedom 

Covariance matrix of idiosyncratic 
risks 

Liquidity duration of the new pro- 
duction 

Liquidity duration of the produc- 
tion stock 

Debye function 

Determinant of the matrix A 
Diagonal matrix with elements 
(U1,--+;Un) 

Delta of the option at time t 
Difference operator An V; = Vi — 
Vi-n with lag h 
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A CoVaR; Delta CoVaR of institution 7 

Atm Time interval tm = tm—1 

ôx (y) Dirac delta function 

e; The value of the vector is 1 for the 
row i and 0 elsewhere 


z [X] Mathematical expectation of the 
random variable X 

E (A) Exponential probability distribu- 
tion with parameter A 

e(t) Potential future exposure at time 
t 

EE (t) Expected exposure at time t 

EEE (t) Effective expected exposure at 


time t 
EEPE (0; t) Effective expected positive ex- 
posure for the time period (0, t] 


EnE (t) Risk-neutral expected negative 
exposure at time t 
EpE (t)  Risk-neutral expected positive ex- 


posure at time t 
EPE (0;t) Expected positive exposure for 
the time period (0, t] 


ESa (w) Expected shortfall of portfolio w 
at the confidence level a 

exp(A) Exponential of the matrix A 

f (2) Probability density function 

fin (x) Probability density function of the 
order statistic Xj.n 

fy A) Spectral density function of the 
stochastic process yz 

F (x) Cumulative distribution function 

Fj-,(z) Cumulative distribution function 
of the order statistic Xj.n 

F~! (a) Quantile function 

F” n-fold convolution of the probabil- 
ity distribution F with itself 

F Vector of risk factors (F1, ..., Fm) 

F; Risk factor j 

Fi Filtration 

fi (TL) Instantaneous forward rate at 
time t for the maturity T 

f(t,T) Alternative form of f: (T) 


F; (Tı, T2) Forward interest rate at time t for 
the period [T;, To] 

F (t, Tı, T2) Alternative form of F; (T1, T2) 

§ (v1, v2) Fisher-Snedecor distribution with 
parameters vı and v2 


G (p) Geometric distribution with pa- 
rameter p 
G (a) Standard gamma distribution 


with parameter @ 
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G(a,8) Gamma distribution with param- 
eters a and 8 

V1 Skewness 

Y2 Excess kurtosis 

r: Gamma of the option at time t 

T (a) Gamma function defined as 
ponte er dt 

y(a,z) Lower incomplete gamma function 
defined as fj t° te~ dt 

T (a,x) Upper incomplete gamma func- 


tion defined as f° t?~le~' dt 
GEV (u,0,€) GEV distribution with param- 
eters u, o and € 
GPD (o,€) Generalized Pareto distribution 
with parameters ø and € 


h Holding period 
h Kernel or smoothing parameter 
H- Lower half-space 


H+ Upper half-space 

H Hyperplane 

H(X) Shannon entropy of X 

H (X,Y) Cross-entropy of X and Y 

H (Y | X) Conditional entropy of Y with re- 


spect to X 

i Asset (or component) i 

In Identity matrix of dimension n 

I(X,Y) Mutual information of X and Y 

T (6) Information matrix 

TB (x;a,3) Regularized incomplete beta 
function 

J (0) Fisher information matrix 

K Regulatory capital 

K (x,x') Kernel function of x and x’ 

£ (0) Log-likelihood function with 6 the 
vector of parameters to estimate 

Ly Log-likelihood function for the ob- 
servation t 

L Lag operator: Lyt = Yt—1 

L or L (w) Loss of portfolio w 

L(x; A) Lagrange function, whose La- 
grange multiplier is À 

ln A Logarithm of the matrix A 


LG (a, 8) Log-gamma distribution with pa- 
rameters œ and 8 

LL (a, B) Log-logistic distribution with pa- 
rameters a and 8 

LN (u,a?) Log-normal distribution with pa- 
rameters u and o 

À Parameter of exponential survival 

times 

Hazard function 
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AT Lower tail dependence 
At Upper tail dependence 
A (x) Gumbel distribution 
A(t) Markov generator 


MDA (G) Maximum domain of attraction of 

the extreme value distribution G 

Marginal expected shortfall of in- 

stitution 2 

MPE, (0; t) Maximum peak exposure for the 
time period (0, ¢] with a confidence 
level a 


MES, 


MR; Marginal risk of asset i 

MtM Mark-to-market of the portfolio 

H Vector of expected returns 
(u, Twa Hn) 

li Expected return of asset i 

Hm Expected return of the market 
portfolio 

Ê Empirical mean 

p(w) Expected return of portfolio w 

p(X) Mean of the random vector X 

Um(X) m-th centered moment of the ran- 
dom vector X 

ui, (X) m-th moment of the random vec- 


tor X 
N (u,07) Normal distribution with mean p 
and standard deviation ø 
Multivariate normal distribution 
with mean u and covariance ma- 
trix X 
Number of scenarios or simula- 
tions 
Poisson counting process for the 
time interval [0, t] 
Poisson counting process for the 
time interval [¢1, t2] 
Negative binomial distribution 
with parameters r and p 
Q Covariance matrix of risk factors 
P Markov transition matrix 
P Historical probability measure 
P 
P 


N (p, £) 


Cholesky decomposition of © 

(A) Poisson distribution with parame- 
ter À 

Probability mass function of an 
integer-valued random variable 
Price of the put option at time t 
Pareto distribution with parame- 
ters a and x_ 

Pareto distribution with parame- 
ters a and 0 
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PE, (t) Peak exposure at time t with a 
confidence level a 
PV,(£) Present value of the leg £ 


II or I (w) P&L of the portfolio w 

o (x) Probability density function of the 
standardized normal distribution 

$2 (£1, £2; p) Probability density function of 
the bivariate normal distribution 
with correlation p 

n (z; £) Probability density function of the 
multivariate normal distribution 
with covariance matrix © 


® (x) Cumulative density function of 
the standardized normal distribu- 
tion 

-l (a) Inverse of the cdf of the standard- 


ized normal distribution 

®»2 (#1, £2; p) Cumulative density function of 
the bivariate normal distribution 
with correlation p 

n (x;) Cumulative density function of 
the multivariate normal distribu- 
tion with covariance matrix X 


a(x) Fréchet distribution 

a(x) Weibull distribution 

px (t) Characteristic function of the ran- 
dom variable X 

a(ng) Integer part of ang 

qa (ns) Integer part of (1 — a) ns 

Q Risk-neutral probability measure 

Qr Forward probability measure 

R(t) Rating of the entity at time t 

r Return of the risk-free asset 

R Vector of asset returns 
(Ri,...,Rn) 

R; Return of asset i 

Rit Return of asset 7 at time t 

Rmi Return of the market portfolio at 
time t 

R(w) Return of portfolio w 

R (w) Risk measure of portfolio w 

R(L) Risk measure of loss L 

R (I) Risk measure of P&L II 

R(T)  Zero-coupon rate at time t for the 
maturity T 

RC; Risk contribution of asset 7 

RCF Relative risk contribution of asset 
i 

R Recovery rate 

RPVo1 Risky PVO1 
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p (or C) Correlation matrix of asset re- 
turns 

Pij Correlation between asset returns 
i and j 

p(x,y) Correlation between portfolios x 
and y 

S Credit spread 

S(x) Survival function 

S Stress scenario 

St Price of the underlying asset at 
time t 

S (T Survival function of T at time t 

S(t,u) Amortization function of the new 
production 

S* (t, u) Amortization function of the pro- 
duction stock 

S (y+) Stationary form of the process y: 

SES; Systemic expected shortfall of in- 


stitution 2 
SN (€,0,7) Skew normal distribution 
SRISK; Systemic risk contribution of insti- 
tution 2 
ST (€,0,,v) Skew t distribution 


SV,(£) Stochastic discounted value of the 
leg £ 

x Covariance matrix 

È Empirical covariance matrix 

Gi Volatility of asset i 

Om Volatility of the market portfolio 

Õi Idiosyncratic volatility of asset i 

ô Empirical volatility 

a (w) Volatility of portfolio w 

o (X) Standard deviation of the random 
variable X 

ty Student’s t distribution with v de- 
grees of freedom 

tn (X,v) Multivariate Student’s t distribu- 
tion with v degrees of freedom and 
covariance matrix X 

t(x;v) Probability density function of 
the univariate t distribution with 
number of degrees of freedom v 

Abbreviations 

ABCP Asset-backed commercial paper 

ABS Asset-backed security 

ADF Augmented Dickey-Fuller unit root 

test 
ADV Average daily volume 
AER Annual equivalent rate 
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tn (x; £, v) Probability density function of 
the multivariate t distribution 
with parameters © and v 

to (£1, £2; p, V) Probability density function 

of the bivariate ¢ distribution with 

parameters p and v 

Maturity date 

Cumulative density function of 

the univariate t distribution with 

number of degrees of freedom v 

T—! (a; v) Inverse of the cdf of the Student’s 
t distribution with v the number 
of degrees of freedom 

T, (x; £, v) Cumulative density function of 
the multivariate t distribution 
with parameters © and v 

Tə (£1, £2; p, V) Cumulative density function 
of the bivariate ¢ distribution with 
parameters p and v 


T return period 

tr (A) Trace of the matrix A 

0 Vector of parameters 

6 Estimator of 6 

©; Theta of the option at time t 

T Default time 

T Time to maturity T — t 

Ua,b] Uniform distribution between a 
and b 

var (X) Variance of the random variable X 


VaR (w) Value-at-risk of portfolio w at the 
confidence level a 

Ut Vega of the option t 

w Vector of weights (w1,... 

portfolio w 

Weight of asset 2 in portfolio w 

Wiener process 

X Random variable 


, Wn) for 


at Maximum value between x and 0 

Xin it order statistic of a sample of 
size n 

Yt Discrete-time stochastic process 

y Yield to maturity 


AFME Association for Financial Markets 
in Europe 

AFS Available-for-sale 

AIC Akaike information criterion 

AIFMD Alternative investment fund man- 
agers directive 


AIRB 


ALCO 
ALM 
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Advanced internal 
approach (credit risk) 
ALM committee 
Asset liability management 


ratings-based 


AM-CVA Advanced method (credit valua- 


AMA 


AMF 
AMLF 


AR 
ARCH 


ARMA 


AT1 
ATM 


tion adjustment) 

Advanced measurement approaches 
(operational risk) 

Autorité des Marchés Financiers 
ABCP money market mutual fund 
liquidity facility 
Autoregressive process 
Autoregressive conditional 
eroscedasticity process 
Autoregressive moving average pro- 
cess 

Additional tier 1 

At-the-money (option) 


het- 


BA-CVA Basic approach (credit valuation 


BAC 
BaFin 


BAP 
BCBS 


BCC 
BCP 
BCVA 
BD 
BFGS 


BGD 
BIA 


BIS 
BLUE 
BoJ 
BS 
BSM 
BUE 
CAD 
CAM 
CaR 
CB 
CBO 
CCB 


CCF 
CCP 


adjustment) 

Binary asset-or-nothing call option 
Bundesanstalt fiir Finanzdienstleis- 
tungsaufsicht 

Binary asset-or-nothing put option 
Basel Committee on Banking Su- 
pervision 

Binary cash-or-nothing call option 
Binary cash-or-nothing put option 
Bilateral CVA 

Broker-dealer 
Broyden-Fletcher-Goldfarb- 
Shanno algorithm 

Batch gradient descent 

Basic indicator approach (opera- 
tional risk) 

Bank for International Settlements 
Best linear unbiased estimator 
Bank of Japan 

Black-Scholes model 

Basic structural model 

Best unbiased estimator 

Capital adequacy directive 
Constant amortization mortgage 
Capital-at-risk 

Conservation buffer (CET 1) 
Collateralized bond obligation 


Countercyclical capital buffer 
(CET1) 

Credit conversion factor 

Central counterparty clearing 
house 


CCR 
CDF 
CDO 
CDS 
CDT 
CDX 
CE 
CEM 
CET1 
CFH 
CFI 
CFO 
CGFS 


CIR 
CISC 


CLO 
CMBS 
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Counterparty credit risk 
Cumulative distribution function 
Collateralized debt obligation 
Credit default swap 

Credit default tranche 

Credit default index 

Current exposure 

Current exposure method (CCR) 
Common equity tier 1 

Cash flow hedge 

Captive financial institution 

Chief financial officer 

Committee on the Global Financial 
System 

Cox-Ingersoll-Ross process 
Constant inter-sector correlation 
model 

Collateralized loan obligation 
Commercial mortgage-backed secu- 
rity 

Collateralized mortgage obligation 
Conditional value-at-risk 
Consultation paper 

Constant payment mortgage 
Conditional prepayment rate 
Credit rating agency 

Capital requirements directive 
Comprehensive risk measure 

Chief risk officer 

Capital requirements regulation 
Credit spread risk in the banking 
book 

Credit valuation adjustment 
Dickey-Fuller unit root test 
Dodd-Frank Act stress testing 
Davidon-Fletcher-Powell algorithm 
Discrete Fourier transform 
Duration gap 

Down-and-in call option 
Down-and-in put option 
Down-and-out call option 
Down-and-out put option 
Dynamic programming 

Default risk capital 

Dollar value of a one basis point de- 
crease in interest rates 

Debit valuation adjustment 
Exposure at default 
Earnings-at-risk 

Effective annual rate 

European Banking Authority 


XXX 


ECB 
ECM 
ECRA 
EE 
EEE 
EEPE 


EL 
EMIR 


ENE 
EPE 
ERBA 
ES 
ESMA 


ETF 
EV 
EVaR 
EVE 
EVT 
FASB 


FBA 
FC 
FDIC 


FDML 


FFT 
FHFA 
FICO 
FIR 
FIRB 


FNMA 
FRA 
FRB 


FRTB 


FSAP 


FSB 
FtD 
FTP 
FV 
FVA 
FVH 
FVOCI 
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European Central Bank 

Error correction model 

External credit risk assessment 
Expected exposure 

Effective expected exposure 
Effective expected positive expo- 
sure 

Expected loss 
European market 
regulation 
Expected negative exposure 
Expected positive exposure 
External ratings-based approach 
Expected shortfall 
European Securities and Markets 
Authority 
Exchange traded 
Economic value 
Economic value-at-risk 

Economic value of equity 

Extreme value theory 

Financial Accounting Standards 
Board 

Fall-back approach 

Finance company 

Federal Deposit Insurance Corpo- 
ration 

Frequency domain maximum likeli- 
hood 

Fast Fourier transform 

Federal Housing Finance Agency 
Fair Isaac Corporation score 
Finite impulse response filter 
Foundation internal ratings-based 
approach (credit risk) 

Fannie Mae 

Forward rate agreement 

Board of Governors of the Federal 
Reserve System 

Fundamental review of the trading 
book 

Financial sector assessment pro- 
gram 

Financial Stability Board 
First-to-default swap 

Funds transfer pricing 

Fair value 

Founding valuation adjustment 
Fair value hedge 

Fair value through other compre- 
hensive income 


infrastructure 


fund 


FVTPL Fair value through profit and loss 


FWN 
GAAP 


Fractional white noise 
Generally accepted 
principles (US) 


accounting 


GARCH Generalized autoregressive condi- 


GBM 
GCV 
GEV 


GFC 
GMM 
GMM 
GNMA 
GPD 


tional heteroscedasticity process 
Geometric Brownian motion 
Generalized cross-validation 
Generalized extreme value distribu- 
tion 
Global Financial Crisis (2008) 
Generalized method of moments 
Gaussian mixture model 

Ginnie Mae 

Generalized Pareto distribution 


HELOC Home equity line of credit 


HF 
HFT 
HJM 
HLA 
HPP 
HQLA 
HTM 
HY 
IAIS 


IAS 
ICAAP 


ICP 
ICPF 


IF 
IFG 
IFRS 


IG 
ILAAP 


IMA 
IMCC 


IMF 
IMM 


IOSCO 


IPP 
IRB 


Hedge fund 

Held-for-trading 
Heath-Jarrow-Morton model 
Higher loss absorbency 
Homogeneous Poisson process 
High-quality liquid assets 
Held-to-maturity 

High yield entity 

International Association of Insur- 
ance Supervisors 

International accounting standards 
Internal capital adequacy assess- 
ment process 

Insurance Core Principles 
Insurance companies and pension 
funds 

Investment fund 

Infinitely fine-grained portfolio 
International financial reporting 
standards 

Investment grade entity 

Internal liquidity adequacy assess- 
ment process 

Internal model-based 
(market risk) 
Internally modelled capital charge 
(Basel IIT) 

International Monetary Fund 
Internal model method (counter- 
party credit risk) 

International Organization of Secu- 
rities Commissions 

Integration by parts 

Internal ratings-based approach 
(credit risk) 


approach 


MGD 
MiFID 


MiFIR 


ML 
MLE 
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Interest rate risk in the banking 
book 

Incremental risk charge 

Interest rate swap 

International Swaps and Deriva- 
tives Association 

In-the-money (option) 
Jump-to-default 

Kalman filter 

Knock-in call option 

Knock-in put option 

Knock-out call option 

Knock-out put option 
Kwiatkowski-Phillips-Schmidt- 
Shin stationary test 

Key risk indicator 

Loans and receivables 

Least absolute deviation estimator 
Linear congruential generator 
Liquidity coverage ratio 

Loss distribution approach (opera- 
tional risk) 

Linear discriminant analysis 

Loss data collection exercise 

Loan equivalent exposure 

Loss given default 

Local level model 

Local linear trend model 

Libor market model 

Look-through approach 
Last-to-default swap 

Linear time-invariant filter 
Loan-to-value ratio 

Effective maturity 

Moving average process 
Mortgage-backed security 

Monte Carlo 

Markov chain Monte Carlo 
Minimum capital requirement 
Maximum domain of attraction 
Multilateral development bank 
Marginal expected shortfall 
Multivariate extreme value 
Mutual fund 

Mini-batch gradient descent 
Markets in financial instruments di- 
rective 

Markets in financial instruments 
regulation 

Maximum likelihood 

Maximum likelihood estimator 


MM 
MMF 
MPE 
MPOR 
MPP 
MSMVE 


MtM 
MUNFI 


NHPP 
NIH 
NII 
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Method of moments 

Money market fund 

Maximum peak exposure 

Margin period of risk 

Mixed Poisson process 

Min-stable multivariate exponen- 
tial distribution 

Mark-to-market 

Monitoring universe of non-bank fi- 
nancial intermediation 
Non-homogeneous Poisson process 
Net investment hedge 

Net interest income 

Net interest margin 

Net interest spread 

Non-maturity deposit 
Non-negative matrix factorization 
Neural network 

Negotiable order of withdrawal 
Negative quadrant dependence 
Net stable funding ratio 

Office of the Comptroller of the 
Currency 

Ordinary differential equation 
Other financial intermediary 
Ordinary least squares 

Own risk and solvency assessment 
Over-the-counter 
Out-of-the-money (option) 

Office of Thrift Supervision 
Ornstein-Uhlenbeck process 

Profit and loss 

Principal component analysis 
Probability of default 

Partial differential equation 
Probability density function 

Peak exposure 

Potential future exposure 

Profit and loss attribution (Basel 
III) 

Probability mass function 

Peak over threshold 
Phillips-Perron unit root test 
Positive quadrant dependence 
Predicted residual error sum of 
squares 

Public sector entity 

Present value of one bp 

Quadratic discriminant analysis 
Quantitative impact study 
Quasi-Monte Carlo 


RWR 
SA 
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Quadratic programming 
Risk-based capital (US insurance) 
Real estate investment trust 

Risk factor eligibility test (Basel 
IIT) 

Recursive least squares 

Residential mortgage-backed secu- 
rity 

Return-on-equity 

Residual risk add-on 

Risk weight 

Risk-weighted asset 

Right way risk 

Standardized approach (credit risk) 


SA-CCR Standardized approach (counter- 


party credit risk) 


SA-CVA Standardized approach (credit val- 


SA-TB 


SABR 
SBE 
SBS 
SCR 
SCRA 
SDE 
SES 
SFT 
SFV 
SGD 
SIFI 


SIFMA 


SIR 
SIS 
SIV 
SLA 
SLN 


SMC 
SME 


SMM 


SMM 


SM-CCR Standardized method 


uation adjustment) 
Standardized approach for 
trading book (market risk) 
Stochastic alpha-beta-rho model 
Shadow banking entity 

Shadow banking system 

Solvency capital requirement 
Standardized credit risk approach 
Stochastic differential equation 
Systemic expected shortfall 
Securities financing transaction 
Structured finance vehicle 
Stochastic gradient descent 
Systemically important financial 
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Introduction 


The idea that risk management creates value is largely accepted today. However, this has 
not always been the case in the past, especially in the financial sector (Stulz, 1996). Rather, 
it has been a long march marked by a number of decisive steps. In this introduction, we 
present an outline of the most important achievements from a historical point of view. We 
also give an overview of the current financial regulation, which is a cornerstone in financial 
risk management. 


1.1 The need for risk management 


The need for risk management is the title of the first section of the leadership book 
by Jorion (2007), who shows that risk management can be justified at two levels. At the 
firm level, risk management is essential for identifying and managing business risk. At 
the industry level, risk management is a central factor for understanding and preventing 
systemic risk. In particular, this second need is the ‘raison d’étre’ of the financial regulation 
itself. 


1.1.1 Risk management and the financial system 


The concept of risk management has evolved considerably since its creation, which is 
believed to be in the early fifties’. In November 1955, Wayne Snider gave a lecture entitled 
‘The Risk Manager’ where he proposed creating an integrated department responsible for 
risk prevention in the insurance industry (Snider, 1956). Some months later, Gallagher 
(1956) published an article to outline the most important principles of risk management 
and to propose the hiring of a full-time risk manager in large companies. For a long time, 
risk management was systematically associated with insurance management, both from a 
practical point of view and a theoretical point of view. For instance, the book of Mehr and 
Hedges (1963) is largely dedicated to the field of insurance with very few applications to 
other industries. This is explained by the fact that the collective risk model? has helped 
to apply the mathematical and statistical tools for measuring risk in insurance companies 
since 1930. A new discipline known as actuarial science has been developed at the same 
time outside the other sciences and has supported the generalization of risk management in 
the insurance industry. 


Simultaneously, risk became an important field of research in economics and finance. 
Indeed, Arrow (1964) made an important step by extending the Arrow-Debreu model of 
general equilibrium in an uncertain environment’. In particular, he showed the importance 


'See Crockford (1982) or Snider (1991) for a retrospective view on the risk management development. 
?It is also known as the ruin theory or the compound Poisson risk model. 
3This paper was originally presented in 1952 and was also published in Cahiers du CNRS (1953). 
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of hedging and introduced the concept of payoff. By developing the theory of optimal al- 
location for a universe of financial securities, Markowitz (1952) pointed out that the risk 
of a financial portfolio can be diversified. These two concepts, hedging and diversification, 
together with insurance, are the main pillars of modern risk management. These concepts 
will be intensively used by academics in the 1960s and 1970s. In particular, Black and Sc- 
holes (1973) showed the interconnection between hedging and pricing problems. Their work 
had a strong impact on the development of equity, interest rates, currency and commodity 
derivatives, which are today essential for managing the risk of financial institutions. With 
the Markowitz model, a new era had begun in portfolio management and asset pricing. 
First, Sharpe (1964) showed how risk premia are related to non-diversifiable risks and de- 
veloped the first asset pricing model. Then, Ross (1976) extended the CAPM model of 
Sharpe and highlighted the role of risk factors in arbitrage pricing theory. These academic 
achievements will support the further development of asset management, financial markets 
and investment banking. 


In commercial and retail banking, risk management was not integrated until recently. 
Even though credit scoring models have existed since the fifties, they were rather designed 
for consumer lending, especially credit cards. When banks used them for loans and credit 
issuances, they were greatly simplified and considered as a decision-making tool, playing a 
minor role in the final decision. The underlying idea was that the banker knew his client 
better than a statistical model could. However, Banker Trust introduced the concept of 
risk-adjusted return on capital or RAROC under the initiative of Charles Sanford in the 
late 1970s for measuring risk-adjusted profitability. Gene Guill mentions a memorandum 
dated February 1979 by Charles Sanford to the head of bank supervision at the Federal 
Reserve Board of New York that helps to understand the RAROC approach: 


“We agree that one bank’s book equity to assets ratio has little relevance for 
another bank with a different mix of businesses. Certain activities are inherently 
riskier than others and more risk capital is required to sustain them. The truly 
scarce resource is equity, not assets, which is why we prefer to compare and 
measure businesses on the basis of return on equity rather than return on assets” 
(Guill, 2009, page 10). 


RAROC compares the expected return to the economic capital and has become a standard 
model for combining performance management and risk management. Even if RAROC is a 
global approach for allocating capital between business lines, it has been mainly used as a 
credit scoring model. Another milestone was the development of credit portfolio manage- 
ment when Vasicek (1987) adapted the structural default risk approach of Merton (1974) 
to model the loss distribution of a loan portfolio. He then jointly founded KMV Corpora- 
tion with Stephen Kealhofer and John McQuown, which specializes in quantitative credit 
analysis tools and is now part of Moody’s Analytics. 


In addition to credit risk, commercial and retail banks have to manage interest rate 
and liquidity risks, because their primary activity is to do asset, liquidity and maturity 
transformations. Typically, a commercial bank has long-term and illiquid assets (loans) and 
short-term and liquid liabilities (deposits). In such a situation, a bank faces a loss risk that 
can be partially hedged. This is the role of asset liability management(ALM). But depositors 
also face a loss risk that is virtually impossible to monitor and manage. Consequently, there 
is an information asymmetry between banks and depositors. 


In the banking sector, the main issue centered therefore around the deposit insurance. 
How can we protect depositors against the failure of the bank? The 100% reserve proposal 
by Fisher (1935) required banks to keep 100% of demand deposit accounts in cash or 
government-issued money like bills. Diamond and Dybvig (1983) argued that the mixing 
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policy of liquid and illiquid assets can rationally produce systemic risks, such as bank runs. 
A better way to protect the depositors is to create a deposit insurance guaranteed by the 
government. According to the Modigliani-Miller theorem on capital structure‘, this type of 
government guarantee implied a higher cost of equity capital. Since the eighties, this topic 
has been highly written about (Admati and Hellwig, 2014). Moreover, banks also differ 
from other companies, because they create money. Therefore, they are at the heart of the 
monetary policy. These two characteristics (implicit guarantee and money creation) imply 
that banks have to be regulated and need regulatory capital. This is all the more valid with 
the huge development of financial innovations, which has profoundly changed the nature of 
the banking system and the risk. 


1.1.2 The development of financial markets 


The development of financial markets has a long history. For instance, the Chicago Board 
of Trade (CBOT) listed the first commodity futures contract in 1864 (Carlton, 1984). Some 
authors even consider that the first organized futures exchange was the Dojima Rice Market 
in Osaka in the 18th century (Schaede, 1989). But the most important breakthrough came 
in the seventies with two major financial innovations. In 1972, the Chicago Mercantile 
Exchange (CME) launched currency futures contracts after the US had decided to abandon 
the fixed exchange rate system of Bretton Woods (1946). The oil crisis of 1973 and the need 
to hedge currency risk have considerably helped in the development of this market. After 
commodity and currency contracts, interest rate and equity index futures have consistently 
grown. For instance, US Treasury bond, S&P 500, German Bund, and Euro Stoxx 50 futures 
were first traded in 1977, 1982, 1988 and 1998 respectively. Today, the Bund futures contract 
is the most traded product in the world. 


The second main innovation in the seventies concerned option contracts. The CBOT 
created the Chicago Board of Options (CBOE) in 1973, which was the first exchange spe- 
cialized in listed stock call options. The same year, Black and Scholes (1973) published 
their famous formula for pricing a European option. It has been the starting point of the 
intensive development of academic research concerning the pricing of financial derivatives 
and contingent claims. The works of Fisher Black, Myron Scholes and Robert Merton? are 
all the more significant in that they consider the pricing problem in terms of risk hedging. 
Many authors had previously found a similar pricing formula, but Black and Scholes in- 
troduced the revolutionary concept of the hedging portfolio. In their model, they derived 
the corresponding dynamic trading strategy to hedge the option contract, and the option 
price is therefore equivalent to the cost of the hedging strategy. Their pricing method had 
a great influence on the development of the derivatives market and more exotic options, in 
particular path-dependent options®. 


Whereas the primary goal of options is to hedge a directional risk, they will be largely 
used as underlying assets of investment products. In 1976, Hayne Leland and Mark Rubin- 
stein developed the portfolio insurance concept, which allows for investing in risky assets 
while protecting the capital of the investment. In 1980, they founded LOR Associates, Inc. 
with John O’Brien and proposed structured investment products to institutional investors 
(Tufano and Kyrillos, 1995). They achieved very rapid growth until the 1987 stock market 


“Under some (unrealistic) assumptions, Modigliani and Miller (1958) showed that the market value of a 
firm is not affected by how that firm is financed (by issuing stock or debt). They also established that the 
cost of equity is a linear function of the firm’s leverage measured by its debt/equity ratio. 

5 As shown by Bernstein (1992), the works of Black and Scholes cannot be dissociated from the research 
of Merton (1973). This explains why they both received the 1997 Nobel Prize in Economics for their option 
pricing model. 

®See Box 1 for more information about the rise of exotic options. 
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crash’, and were followed by Wells Fargo, J.P. Morgan and Chase Manhattan as well as 
other investment banks. This period marks the start of financial engineering applied to 
structured products and the development of popular trading strategies, such as constant 
proportion portfolio insurance (CPPI) and option based portfolio insurance (OBPI). Later, 
they will be extensively used for designing retail investment products, especially capital 
guaranteed products. 
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Evolution of financial innovations 


1864 Commodity futures 

1970 Mortgage-backed securities 

1971 Equity index funds 

1972 Foreign currency futures 

1973 Stock options 

1977 Put options 

1979 Over-the-counter currency options 

1980 Currency swaps 

1981 Interest rate swaps 

1982 Equity index futures 

1983 Equity index options 
Interest rate caps/floors 
Collateralized mortgage obligations 

1985 Swaptions 
Asset-backed securities 

1987 Path-dependent options (Asian, look-back, etc.) 
Collateralized debt obligations 

1992 Catastrophe insurance futures and options 

1993 Captions/floortions 
Exchange-traded funds 

1994 Credit default swaps 

1996 Electricity futures 

1997 Weather derivatives 

2004 Volatility index futures 

2006 Leveraged and inverse ETFs 

2008 Green bonds 

2009 Crypto currencies 


NX A 


Source: Jorion (2007) and author’s research. 


After options, the next great innovation in risk management was the swap. In a swap 
contract, two counterparties exchange a series of cash flows of one financial instrument for 
those of another financial instrument. For instance, an interest rate swap (IRS) is an ex- 
change of interest rate cash flows from a fixed rate to a floating rate or between two floating 


"In fact, portfolio insurance was blamed by the Brady Commission report (1988) for the stock market 
crash of October 1987. See for instance Leland and Rubinstein (1988), Shiller (1987), Gennotte and Leland 
(1990) and Jacklin et al. (1992) for a discussions about the impact of portfolio insurance on the October 
1987 crash. 
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rates. Swaps have become an important tool for managing balance sheets, in particular 
interest rate and currency risks in the banking book. The original mechanism of cash flow 
exchanges has been extended to other instruments and underlying assets: inflation-indexed 
bonds, stocks, equity indices, commodities, etc. But one of the most significant advances in 
financial innovations was the creation of credit default swaps (CDS) in the mid-nineties, and 
more generally credit derivatives. In the simplest case, the cash flows depend on the default 
of a loan, a bond or a company. We refer then to single-name instruments. Otherwise, they 
depend on credit events or credit losses of a portfolio (multi-name instruments). However, 
the development of credit derivatives was made possible thanks to securitization. This is 
a process through which assets are pooled in a portfolio and securities representing inter- 
ests in the portfolio are issued. Securities backed by mortgages are called mortgage-backed 
securities (MBS), while those backed by other types of assets are asset-backed securities 
(ABS). 

Derivatives are traded either in organized markets or in over-the-counter markets (OTC). 
In organized exchanges, the contracts are standardized and the transactions are arranged 
by the clearing house, which is in charge of clearing and settlement. By contrast, in OTC 
markets, the contracts are customized and the trades are done directly by the two counter- 
parties. This implies that OTC trades are exposed to the default risk of the participants. 
The location of derivatives trades depends on the contract: 


Contract Futures Forward Option Swap 
On-exchange V v 
Off-exchange v v Vv 


For instance, the only difference between futures and forward contracts is that futures are 
traded in organized markets whereas forwards are traded over-the-counter. Contrary to 
options which are negotiated in both markets, swaps are mainly traded OTC. In Table 1.1, 
we report the outstanding amount of exchange-traded derivatives concerning futures and 
options published by the Bank for International Settlements (2019). In December 2018, 
their notional amount is equal to $94.8 tn, composed of $39.0 tn in futures (41.2%) and 
$55.7 tn in options (58.8%). For each instrument, we indicate the split between interest 
rates and currencies®. We notice that exchange-traded derivatives on interest rates are the 
main contributor. The evolution of the total notional amount is reported in Figure 1.1. The 
size of exchange-traded derivative markets has grown rapidly since 2000, peaking in June 
2007 with an aggregated amount of $86.6 tn. This trend ended with the financial crisis since 
we observe a decrease between 2007 and 2016. This is only recently that the outstanding 
amount of exchange-traded derivatives exceeds the 2007 figure. 


Statistics? concerning OTC derivative markets are given in Table 1.2. These markets 
are between six and ten times bigger than exchange-traded markets in terms of outstanding 
amount (Figure 1.3). In June 2018, the aggregated amount of forwards, swaps and options 
is equal to $594.8 tn. Contrary to exchange-traded derivative markets, the notional out- 
standing amount of OTC derivative markets continues to increase after the crisis period, 
but declines recently since 2014 (Figure 1.2). In terms of instruments, swaps dominate and 
represent 65.0% of the total. Like in exchange-traded markets, the main asset class remains 
fixed income. We also notice the impact of the 2008 financial crisis on credit default swaps, 


8The BIS decided in September 2015 to discontinue the compilation of equity index exchange-traded 
derivatives statistics. This is why these statistics do not include the equity index futures and options. In 
December 2014, equity index futures and options represented 11.1% of exchange-traded derivatives. 

°In order to compute these statistics, we have done some assumptions because we don’t have a perfect 
granularity of the data. For equity and commodity buckets, we don’t have the split between forwards 
and swaps. We allocate 50% of the amount in each category. We also attribute the full amount of credit 
derivatives to the swap bucket. 
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TABLE 1.1: Notional outstanding amount of exchange-traded derivatives 


2004 2007 2010 2014 2018 
Futures 42.6% 37.9% 341% 444% 41.2% 
"~~ ‘Tnterest rate 99.4% 99.38% 99.2% 99.1% 99.3% 
Short-term 94.7% 94.0% 94.9% 93.6% 92.6% 
Long-term 5.3% 6.0% 5.1% 64% 74% 
Currency 0.6% 0.7% 0.8% 0.9% 0.7% 
Options 57.4% 62.1% 65.9% 55.6% 58.8% 
~ ‘Interest rate 99.8% 99.7% 99.6% 99.6% 99.8% 
Short-term 98.2% 98.6% 98.9% 97.7% 98.3% 
Long-term 1.9% 14% 11% 2.3% 1.7% 
Currency 0.2% 0.3% 0.4% 0.5% 0.3% 
Total (in $ tn) 43.0 71.5 62.3 57.6 94.8 


Source: Bank for International Settlements (2019) and author’s calculations. 
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FIGURE 1.1: Notional outstanding amount of exchange-traded derivatives (in $ tn) 


Source: Bank for International Settlements (2019) and author’s calculations. 
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which represented more than 10% of the OTC derivative markets in December 2007. Ten 
years after, they represent less than 2.0% of these markets. 


TABLE 1.2: Notional outstanding amount of OTC derivatives 
2004 2007 2010 2014 2018 


Forwards 12.9% 11.8% 15.4% 20.2% 24.0% 
Swaps 71.1% 73.3% 73.2% 69.4% 65.0% 
Options 15.9% 14.9% 11.4% 10.3% 10.8% 
Unallocated 0.1% 0.0% 0.0% 01% 0.1% 
Currency 13.4% 11.4% 11.3% 13.1% 16.1% 
Interest rate 79.5% 73.8% 81.9% 82.8% 80.9% 
Equity 2.0% 1.6% 1.0% 1.1% 1.2% 
Commodity 06% 16% 06% 03% 0.4% 
Credit 4.5% 11.6% 52% 2.7% 14% 


Unallocated 0.1% 0.0% 0.0% 0.0% 0.0% 
Total (in $ tn) 258.6 585.9 601.0 627.8 594.8 


Source: Bank for International Settlements (2019) and author’s calculations. 


Whereas the notional outstanding amount is a statistic to understand the size of the 
derivatives markets, the risk and the activity of these markets may be measured by the 
gross market value and the turnover: 


e The gross market value of outstanding derivatives contracts represents “the cost of 
replacing all outstanding contracts at market prices prevailing on the reporting date. 
It corresponds to the maximum loss that market participants would incur if all coun- 
terparties failed to meet their contractual payments and the contracts were replaced 
at current market prices” (Bank for International Settlements, 2014). 


The turnover is defined as “the gross value of all new deals entered into during a given 
period, and is measured in terms of the nominal or notional amount of the contracts. 
It provides a measure of market activity, and can also be seen as a rough proxy for 
market liquidity.” (Bank for International Settlements, 2014). 


In June 2018, the gross market value is equal to $10.3 tn for OTC derivatives. It is largely 
lower than the figure of $34.9 tn in December 2008. This decrease is explained by less 
complexity in derivatives, but also by a lower volatility regime. For OTC derivatives, it is 
difficult to measure a turnover, because the contracts are not standardized. This statistic is 
more pertinent for exchange-traded markets. In December 2018, the daily average turnover 
is equal to $8.1 tn for futures contacts and $1.8 tn for options. This means that each day, 
almost $10 tn of new derivative exposures are negotiated in exchange-traded markets. The 
consequence of this huge activity is a growing number of financial losses for banks and 
financial institutions (Reinhart and Rogoff, 2009). 


1.1.3 Financial crises and systemic risk 


A financial institution generally faces five main risks: (1) market risk, (2) credit risk, (3) 
counterparty credit risk, (4) operational risk and (5) liquidity risk. Market risk is the risk of 
losses due to changes in financial market prices. We generally distinguish four major types 
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of market risk: equity risk, interest rate risk, currency risk and commodity risk. These risks 
are present in trading activities, but they also affect all activities that use financial assets. 
Credit risk is the risk of losses due to the default of a counterparty to fulfill its contractual 
obligations, that is to make its required payments. It principally concerns debt transactions 
such as loans and bonds. Counterparty credit risk is another form of credit risk, but concerns 
the counterparty of OTC transactions. Examples include swaps and options, security lending 
or repo transactions. Operational risk is the risk of losses resulting from inadequate or failed 
internal processes, people and systems, or from external events. Examples of operational 
risk are frauds, natural disasters, business disruption, rogue trading, etc. Finally, liquidity 
risk is the risk of losses resulting from the failure of the financial institution to meet its 
obligations on time. This definition corresponds more to funding liquidity, but liquidity risk 
also concerns market liquidity, which is the cost to buy or sell assets on the market. 
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An history of financial losses 


1974 Herstatt Bank: $620 mn (foreign exchange trading) 
1994 Metallgesellschaft: $1.3 bn (oil futures) 
1994 Orange County: $1.8 bn (reverse repo) 
1994 Procter & Gamble: $160 mn (ratchet swap) 
1995 Barings Bank: $1.3 bn (stock index futures) 
1997 Natwest: $127 mn (swaptions) 
1998 LTCM: $4.6 bn (liquidity crisis) 
2001 Dexia Bank: $270 mn (corporate bonds) 
2006 Amaranth Advisors: $6.5 bn (gaz forward contracts) 
2007 Morgan Stanley: $9.0 bn (credit derivatives) 
2008 Société Générale: $7.2 bn (rogue trading) 
2008 Madoff: $65 bn (fraud) 
2011 UBS: $2.0 bn (rogue trading) 
agi JPMorgan Chase: $5.8 bn (credit derivatives) P 


Source: Jorion (2007) and author’s research. 


In Box 2, we have reported some famous financial losses. Most of them are related to the 
market risk or the operational risk!°. In this case, these losses are said to be idiosyncratic 
because they are specific to a financial institution. Idiosyncratic risk is generally opposed to 
systemic risk: systemic risk refers to the system whereas idiosyncratic risk refers to an entity 
of the system. For instance, the banking system may collapse, because many banks may 
be affected by a severe common risk factor and may default at the same time. In financial 
theory, we generally make the assumption that idiosyncratic and common risk factors are 
independent. However, there exist some situations where idiosyncratic risk may affect the 
system itself. It is the case of large financial institutions, for example the default of big 
banks. In this situation, systemic risk refers to the propagation of a single bank distressed 
risk to the other banks. 


10We have excluded the credit risk losses due to the 2008 global financial crisis. Even if the true cost of 
this crisis will never be known, it is very high, certainly larger than $10 tn. 
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The case of Herstatt Bank is an example of an idiosyncratic risk that could result in a 
systemic risk. Herstatt Bank was a privately German bank. On 26 June 1974, the German 
Banking Supervisory Office withdrew Herstatt’s banking licence after finding that the bank’s 
foreign exchange exposures amounted to three times its capital (BCBS, 2014d). This episode 
of settlement risk caused heavy losses to other banks, adding a systemic dimension to the 
individual failure of Herstatt Bank. In response to this turmoil, the central bank governors 
of the G10 countries established the Basel Committee on Banking Supervision at the end 
of 1974 with the aim to enhance the financial stability at the global level. 


Even if the default of a non-financial institution is a dramatic event for employees, depos- 
itors, creditors and clients, the big issue is its impact on the economy. Generally, the failure 
of a company does not induce a macro-economic stress and is well located to a particular 
sector or region. For instance, the decade of the 2000s had faced a lot of bankruptcies, e.g. 
Pacific Gas and Electric Company (2001), Enron (2001), WorldCom (2002), Arthur Ander- 
sen (2002), Parmalat (2003), US Airways (2004), Delta Air Lines (2005), Chrysler (2009), 
General Motors (2009) and LyondellBasell (2009). However, the impact of these failures 
was contained within the immediate environment of the company and was not spread to 
the rest of the economy. 


In the financial sector, the issue is different because of the interconnectedness between 
the financial institutions and the direct impact on the economy. And the issue is especially 
relevant that the list of bankruptcies in finance is long including, for example: Barings 
Bank (1995); HIH Insurance (2001); Conseco (2002); Bear Stearns (2008), Lehman Broth- 
ers (2008); Washington Mutual (2008); DSB Bank (2008). The number of banking and 
insurance distresses is even more impressive, for example: Northern Rock (2007); Coun- 
trywide Financial (2008); Indy Mac Bank (2008); Fannie Mae/Freddie Mac (2008); Merrill 
Lynch (2008); AIG (2008); Wachovia (2008); Depfa Bank (2008); Fortis (2009); Icelandic 
banks (2008-2010); Dexia (2011). In Figure 1.4, we report the number of bank failures com- 
puted by the Federal Deposit Insurance Corporation (FDIC), the organization in charge of 
insuring depositors in the US. We can clearly identify three periods of massive defaults!: 
1935-1942, 1980-1994 and 2008-2014. Each period corresponds to a banking crisis!? and 
lasts long because of delayed effects. Whereas the 1995-2007 period is characterized by a 
low default rate with no default in 2005-2006, there is a significant number of bank defaults 
these last years (517 defaults between 2008 and 2014). 


The Lehman Brothers collapse is a case study for understanding the systemic risk. 
Lehman Brothers filed for Chapter 11 bankruptcy protection on 15 September 2008 after 
incurring heavy credit and market risk losses implied by the US subprime mortgage crisis. 
The amount of losses is generally estimated to be about $600 bn, because Lehman Brothers 
had at this time $640 bn in assets and $620 bn in debt. However, the cost for the system is far 
greater than this figure. On equity markets, about $10 tn went missing in October 2008. The 
post-Lehman Brothers default period (from September to December 2008) is certainly one of 
the most extreme liquidity crisis experienced since many decades. This forced central banks 
to use unconventional monetary policy measures by implementing quantitative easing (QE) 
programmes. For instance, the Fed now holds more than five times the amount of securities 
it had prior before September 2008. The collapse of Lehman Brothers had a huge impact on 
the banking industry, but also on the asset management industry. For instance, four days 
after the Lehman Brothers bankruptcy, the US government extended temporary guarantee 
on money market funds. At the same time, the hedge fund industry suffered a lot because 
of the stress on the financial markets, but also because Lehman Brothers served as prime 
broker for many hedge funds. 


1lWe define these periods when the yearly number of defaults is larger than 15. 
12They are the Great Depression, the savings and loan crisis of the 1980s and the subprime crisis. 
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FIGURE 1.4: Number of bank defaults in the US 


Source: Federal Deposit Insurance Corporation, Historical Statistics on Banking — Failures & 
Assistance Transactions, www.fdic.gov/bank/individual/failed. 


The 2008 Global Financial Crisis also demonstrated that banks are not the only layer 
of systemic risk. In fact, a systemic risk implies that the entire financial system is seriously 
affected, but also participates to the creation of this risk: 


“[...] there are both old and new components in both the origins and the prop- 
agation of the subprime shock. Old components include government financial 
subsidies for bearing risk, accommodative monetary policy, and adverse selec- 
tion facilitated by asymmetric information. New components include the central 
role of agency problems in asset management, the ability of financial institutions 
to raise new capital from external sources, the activist role of the United States 
Treasury Department and Federal Reserve, and improvements in U.S. financial 
system diversification resulting from deregulation, consolidation, and globaliza- 
tion” (Calomiris, 2009, page 6). 


This implies that all financial components, and not only the banking system, can potentially 
be a source of systemic risk. This is why the bankruptcy of a financial institution cannot be 
compared to the bankruptcy of a corporate company. Nevertheless, because of the nature 
of the systemic risk, it is extremely difficult to manage it directly. This explains that the 
financial supervision is principally a micro-prudential regulation at the firm level. This is 
only recently that it has been completed by macro-prudential policies in order to mitigate 
the risk of the financial system as a whole. While the development of risk management was 
principally due to the advancement of internal models before the 2008 financial crisis, it is 
now driven by the financial regulation, which completely reshapes the finance industry. 
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1.2 Financial regulation 


The purpose of supervision and regulatory capital has been to control the riskiness 
of individual banks and to increase the stability of the financial system. As explained in 
the previous section, it is a hard task whose bounds are not well defined. Among all the 
institutions that are participating to this work (see Table 1.3), four international authorities 
have primary responsibility of the financial regulation: 


1. The Basel Committee on Banking Supervision (BCBS) 

2. The International Association of Insurance Supervisors (IAIS) 

3. The International Organization of Securities Commissions (IOSCO) 
4. The Financial Stability Board (FSB) 


The Basel Committee on Banking Supervision provides a forum for regular cooperation on 
banking supervisory matters. Its main objective is to improve the quality of banking super- 
vision worldwide. The International Association of Insurance Supervisors is the equivalent 
of the Basel Committee for the insurance industry. Its goal is to coordinate local regulations 
and to promote a consistent and global supervision for insurance companies. The Interna- 
tional Organization of Securities Commissions is the international body that develops and 
implements standards and rules for securities and market regulation. While these three au- 
thorities are dedicated to a specific financial industry (banks, insurers and markets), the 
FSB is an international body that makes recommendations about the systemic risk of the 
global financial system. In particular, it is in charge of defining systemically important fi- 
nancial institutions or SIFIs. Among those different regulators, the BCBS is by far the most 
active and the banking regulation is certainly the most homogeneous between countries. 


These four international bodies define standards at the global level and promote con- 
vergence between local supervision. The implementation of the rules is the responsibility 
of national supervisors or regulators!*. In the case of the European Union, they are the 
European Banking Authority (EBA), the European Insurance and Occupational Pensions 
Authority (EIOPA), the European Securities and Markets Authority (ESMA) and the Eu- 
ropean System of Financial Supervision (ESFS). A fifth authority, the European Systemic 
Risk Board (ESRB), completes the European supervision system. 


The equivalent authorities in the US are the Board of Governors of the Federal Reserve 
System, also known as the Federal Reserve Board (FRB), the Federal Insurance Office (FIO) 
and the Securities and Exchange Commission (SEC). In fact, the financial supervision is 
more complicated in the US as shown by Jickling and Murphy (2010). The supervisor of 
banks is traditionally the Federal Deposit Insurance Corporation (FDIC) for federal banks 
and the Office of the Comptroller of the Currency (OCC) for national banks. However, the 
Dodd-Frank Act created the Financial Stability Oversight Council (FSOC) to monitor sys- 
temic risk. For banks and other financial institutions designated by the FSOC as SIFIs, the 
supervision is directly done by the FRB. The supervision of markets is shared between the 
SEC and the Commodity Futures Trading Commission (CFTC), which supervises deriva- 


tives trading including futures contracts and options!*. 


13The regulator is responsible of setting rules and policy guidelines. The supervisor evaluates the safety 
and soundness of individual banks and verifies that the regulation rules are applied. In Europe, the regulator 
is EBA while the supervisor is ECB. 

14A complete list of supervisory authorities by countries are provided on page 28. 
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TABLE 1.3: The supervision institutions in finance 


Banks Insurers Markets All sectors 
Global BCBS IAIS IOSCO FSB 
EU EBA/ECB EIOPA ESMA ESFS 
US FDIC/FRB FIO SEC FSOC 


1.2.1 Banking regulation 


The evolution of the banking supervision has highly evolved since the end of the eighties. 
Here are the principal dates: 


1988 Publication of “International Convergence of Capital Measurement and Capital Stan- 
dards”, which is better known as “The Basel Capital Accord”. This text sets the rules 
of the Cooke ratio. 


1993 Development of the Capital Adequacy Directive (CAD) by the European Commission. 


1996 Publication of “Amendment to the Capital Accord to incorporate Market Risks”. 
This text includes the market risk to compute the Cooke ratio. 


2001 Publication of the second consultative document “The New Basel Capital Accord” 
of the Basel II framework. 


2004 Publication of “International Convergence of Capital Measurement and Capital Stan- 
dards — A Revisited Framework”. This text establishes the Basel II framework. 


2006 Implementation of the Basel II framework. 


2010 Publication of the Basel III framework. 


2013 Beginning of the implementation of the Basel III framework. Its finalization is ex- 
pected for January 2027. 


2017 Finalization of Basel III reforms. 
2019 Publication of “Minimum Capital Requirements for Market Risk”. This is the final 
version of the Basel III framework for computing the market risk. 


This list places the three Basel Accords within a timeframe. However, it gives a misleading 
image of the banking supervision dynamics. In order to have a better view, we have reported 
the cumulative number of standards!’ that have been published by the Basel Committee 
on Banking Supervision in Figure 1.5. 

In 1988, the Basel Committee introduced the Cooke ratio!®, which is the minimum 
amount of capital a bank should maintain in case of unexpected losses. Its goal is to: 


e provide an adequation between the capital held by the bank and the risk taken by the 
bank; 


e enhance the soundness and stability of the banking system; 


e and reduce the competitive inequalities between banks!”. 


15They can be found by using the website of the BCBS: https://www.bis.org/bcbs/publications.htm 
and selecting the publication type ‘Standards’. 

16This ratio took the name of Peter Cooke, who was the Chairman of the BCBS between 1977 and 1988. 

17This was particularly true between Japanese banks, which were weakly capitalized, and banks in the 
US and Europe. 
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FIGURE 1.5: The huge increase of the number of banking supervision standards 


Source: Basel Committee on Banking Supervision and author’s calculations. 


It is measured as follows: 


C 
ki io = —— 
Cooke Ratio RWA 


where C and RWA are the capital and the risk-weighted assets of the bank. A risk-weighted 
asset is simply defined as a bank’s asset weighted by its risk score or risk weight (RW). 
Because bank’s assets are mainly credits, the notional is generally measure by the exposure 
at default (EAD). To compute risk-weighted assets, we then use the following formula: 


RWA = EAD RW 


The original Basel Accord only considers credit risk and classifies bank’s exposures into four 
categories depending on the value of the risk weights!® (0%, 20%, 50% and 100%). Con- 
cerning off-balance sheet exposures, engagements are converted to credit risk equivalents 
by multiplying the nominal amount by a credit conversion factor (CCF) and the result- 
ing amounts are risk-weighted according to the nature of the counterparty. Concerning the 
numerator of the ratio, the Basel Committee distinguishes tier 1 capital and tier 2 capi- 
tal. Tier 1 capital!? (or core capital) is composed of (1) common stock (or paid-up share 


18 These categories are defined as follows: (1) cash, gold, claims on OECD governments and central banks, 
claims on governments and central banks outside OECD and denominated in the national currency are 
risk-weighted at 0%; (2) claims on all banks with a residual maturity lower than one year, longer-term 
claims on OECD incorporated banks, claims on public-sector entities within the OECD are weighted at 
20%; (3) loans secured on residential property are risk-weighted at 50%; (4) longer-term claims on banks 
incorporated outside the OECD, claims on commercial companies owned by the public sector, claims on 
private-sector commercial enterprises are weighted at 100%. 

19 At least 50% of the tier 1 capital should come from the common equity. 
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capital) and (2) disclosed reserves (or retained earnings), whereas tier 2 capital represents 
supplementary capital such as”? (1) undisclosed reserves, (2) asset revaluation reserves, (3) 
general loan-loss reserves (or general provisions), (4) hybrid debt capital instruments and 
(5) subordinated debt. The Cooke ratio required a minimum capital ratio of 8% when con- 
sidering both tier 1 and tier 2 capital, whereas tier 1 capital ratio should be at least half of 
the total capital or 4%. 


Example 1 The assets of a bank are composed of $100 mn of US treasury bonds, $100 
mn of Brazilian government bonds, $50 mn of residential mortgage, $300 mn of corporate 
loans and $20 mn of revolving credit loans. The bank liability structure includes $25 mn of 
common stock and $18 mn of subordinated debt. 


For each asset, we compute the RWA by choosing the right risk weight factor. We obtain 
the following results: 


Asset EAD RW RWA 
US treasury bonds 100 0% 0 
Brazilian Gov. bonds 100 100% 100 
Residential mortgage 50 50% 25 
Corporate loans 300 100% 300 
Revolving credit 20 100% 20 
Total 445 


The risk-weighted assets of the bank are then equal to $445 mn. We deduce that the capital 
adequacy ratio is: 


38 
Cooke Ratio = — = 8.54 
ooke Ratio T % 


This bank meets the regulatory requirements, because the Cooke ratio is higher than 8% 
and the tier 1 capital ratio?! is also higher than 4%. Suppose now that the capital of the 
bank consists of $13 mn of common stock and $25 mn of subordinated debt. In this case, 
the bank does not satisfy the regulatory requirements, because the tier 2 capital cannot 
exceed the tier 1 capital, meaning that the Cooke ratio is equal to 8.54% and the capital 
tier 1 ratio is equal to 2.92%. 


The Basel Accord, which has been adopted by more than 100 countries, has been imple- 
mented in the US by the end of 1992 and in Europe in 1993. In 1996, the Basel Committee 
published a revision of the original Accord by incorporating market risk. This means that 
banks have to calculate capital charges for market risk in addition to the credit risk. The 
major difference with the previous approach to measure credit risk is that banks have the 
choice between two methods for applying capital charges for the market risk: 


e the standardized measurement method (SMM); 
e the internal model-based approach?” (IMA). 


Within the SMM, the bank apply a fixed capital charge for each asset. The market risk 
requirement is therefore the sum of the capital charges for all the assets that compose the 
bank’s portfolio. With IMA, the bank estimates the market risk capital charge by computing 
the 99% value-at-risk of the portfolio loss for a holding period of 10 trading days. From a 


20The comprehensive definitions and restrictions to define all the elements of capital are defined in Ap- 
pendix 1 in BCBS (1988). 

21The tier 1 capital ratio is equal to 25/445 = 5.26%. 

?2The use of the internal model-based approach is subject to the approval of the national supervisor. 
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statistical point of view, the value-at-risk? with a confidence level a is defined as the 
quantile œ associated to the probability distribution of the portfolio loss (see Figure 1.6). 


Required 
Capital 


Profit 


Portfolio loss 


FIGURE 1.6: Probability distribution of the portfolio loss 


Another difference with credit risk is that the bank directly computes the market risk 


capital requirement Kmr with these two approaches?*. Therefore, the Cooke ratio be- 


comes”: 


CBank 
> 
RWA +12.5 x Kur 8% 


We deduce that: 
CBank Zz 8% x RWA + KMR 
eS — 


Kcr 


meaning that 8% x RWA can be interpreted as the credit risk capital requirement Kor, 
which can be compared to the market risk capital charge Kmr. 


Example 2 We consider Example 1 and assume that the bank has a market risk on an 
equity portfolio of $25 mn. The corresponding risk capital charge for a long exposure on a 
diversified portfolio of stocks is equal to 12%. Using its internal model, the bank estimates 
that the 99% quantile of the portfolio loss is equal to $1.71 mn for a holding period of 10 
days. 


?3In the Basel III framework, the expected shortfall, which is defined as the average loss beyond the 
value-at-risk, replaces the value-at-risk for computing the market risk. 

24We use the symbols C and K in order to make the distinction between the capital of the bank and the 
regulatory capital requirement. 

25When considering market risk, the total capital may include tier 3 capital, consisting of short-term 
subordinated debt with an original maturity of at least 2 years. 
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In the case of the standardized measurement method, the market risk capital requirement 

is equal to $3 mn?°. The capital ratio becomes: 
Cooke Rati a 7.88% 
we pR 
In this case, the bank does not meet the minimum capital requirement of 8%. If the bank 
uses its internal model, the Cooke ratio is satisfied: 
Cooke Rati a 8.15% 
AER NES eaa ee 

The Basel Accord has been highly criticized, because the capital charge for credit risk is 
too simplistic and too little risk sensitive: limited differentiation of credit risk, no maturity, 
granularity of risk weights, etc. These resulted in regulatory arbitrage through the use of 
securitization between assets with same regulatory risk but different economic risk. In June 
1999, the Basel Committee produced an initial consultative document with the objective 
to replace the 1988 Accord by a new capital adequacy framework. This paper introduces 
some features about Basel II, but this is really the publication of the second consultative 
paper in January 2001 that marks a milestone for the banking regulation. Indeed, the 
2001 publication is highly detailed and comprehensive, and the implementation of this new 
framework seemed very complex at that time. The reaction of the banking industry was 
negative and somehow hostile at the beginning, in particular because the Basel Committee 
introduced a third capital charge for operational risk besides credit and market risks and 
the implementation costs were very high. It has taken a long time until the Basel Committee 
and the banking industry converge to an accord. Lastly, the finalized Basel II framework is 
published in June 2004. 


TABLE 1.4: The three pillars of the Basel II framework 


Pillar 1 Pillar 2 Pillar 3 
Minimum Capital Supervisory Review Market Discipline 
Requirements Process 
Credit risk Review & reporting Capital structure 
Market risk Capital above Pillar 1 | Capital adequacy 
Operational risk Supervisory monitor- Models & parameters 
ing Risk management 


As illustrated in Table 1.4, the new Accord consists of three pillars: 


1. the first pillar corresponds to minimum capital requirements, that is, how to compute 
the capital charge for credit risk, market risk and operational risk; 


2. the second pillar describes the supervisory review process; it explains the role of the 
supervisor and gives the guidelines to compute additional capital charges for specific 
risks, which are not covered by the first pillar; 


26We have: 
Kmr = 12% x 25=3 
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3. the market discipline establishes the third pillar and details the disclosure of required 
information regarding the capital structure and the risk exposures of the bank. 
Regarding the first pillar, the Cooke ratio becomes: 


CBank > 8% 
RWA +12.5 x Kumr + 12.5 x Kor 


where Kor is the capital charge for operational risk. This implies that the required capital 
is directly computed for market risk and operational risk whereas credit risk is indirectly 
measured by risk-weighted assets?’. 


Example 3 We assume that the risk-weighted assets for the credit risk are equal to $500 
mn, the capital charge for the market risk is equal to $10 mn and the capital charge for the 
operational risk is equal to $3 mn. 


We deduce that the required capital for the bank is: 


K = 8% x (RWA +12.5 x Kup + 12.5 x Kor) 
= 8% x RWA+Kmur+Kor 
= 8%x5004+104+3 
$53 mn 


This implies that credit risk represents 75.5% of the total risk. 


With respect to the original Accord, the Basel Committee did not change the market 
risk approach whereas it profoundly changed the methods to compute the capital charge 
for the credit risk. Two approaches are proposed: 


e The standardized approach (SA) 
This approach, which is more sensitive than Basel I, is based on external ratings 
provided by credit rating agencies. The capital charge is computed by considering a 
mapping function between risk weights and credit ratings. 


e The internal ratings-based approach (IRB) 

This approach can be viewed as an external risk model with internal and external 
risk parameters. The key parameter is the default probability of the asset, which is 
deduced from the internal credit rating model of the bank. The Basel Committee 
makes the distinction between two methods. In the foundation IRB (FIRB), the bank 
only estimates the probability of default and uses standard values for the other risk 
parameters of the model. In the advanced IRB (AIRB), the bank may estimate all 
the risk parameters. 


Regarding operational risk, the Basel Committee propose three approaches to compute the 
required capital: 


e The Basic Indicator Approach (BIA) 
In this case, the capital charge is a fixed percentage of the gross income. 


e The Standardized Approach (TSA) 
This method consists of dividing bank’s activities into eight business lines. For each 
business line, the capital charge is a fixed percentage 8 of its gross income. The 
parameter ( depends on the riskiness of the business line. The total capital is the sum 
of the eight regulatory capital charges. 


27Tn fact, we can define risk-weighted assets for each category of risk. We have the following relationships 
RWAR = 12.5 x Kr and Kr = 8% x RWAR where KR is the required capital for the risk R. The choice 
of defining either RWAR or KR is a mere convention. 
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e Advanced Measurement Approaches (AMA) 
In this approach, the bank uses a statistical model with internal data for estimating 
the total capital. 


A summary of the different options is reported in Figure 1.7. 


Basel II 


FIGURE 1.7: Minimum capital requirements in the Basel II framework 


The European Union has adopted the Basel II framework in June 2006 with the capital 
requirements directive?® (CRD). In the United States, Basel II is partially applied since 2006 
and only concerns the largest banking institutions (Getter, 2014). Since the 2004 publication, 
more than 40 countries have fully implemented Basel II (Hong Kong in January 2007, Japan 
in March 2007, Canada in November 2007, South Korea in December 2007, Australia in 
January 2008, South Africa in January 2008, etc.). However, the subprime crisis in 2007 and 
the collapse of Lehman Brothers in September 2008 illustrated the limits of the New Accord 
concerning the issues of leverage and liquidity. In response to the financial market crisis, 
the Basel Committee enhances then the New Accord by issuing a set of documents between 
2009 and 2010. In July 2009, the Basel Committee approved a package of measures to 
strengthen the rules governing trading book capital, particularly the market risk associated 


81t replaces CAD II (or the 98/31/EEC directive), which is the revision of the original CAD and incor- 
porates market risk. 
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to securitization and credit-related products. Known as the Basel 2.5 framework, these new 
rules can be summarized into four main elements, which are: 


1. the incremental risk charge (IRC), which is an additional capital charge to capture 
default risk and migration risk for unsecuritized credit products; 


2. the stressed value-at-risk requirement (SVaR), which is intended to capture stressed 
market conditions; 


3. the comprehensive risk measure (CRM), which is an estimate of risk in the credit 
correlation trading portfolio (CDS baskets, CDO products, etc.); 


4. new standardized charges on securitization exposures, which are not covered by CRM. 


In addition to these elements affecting the first pillar, the Basel Committee also expands the 
second pillar (largest exposures and risk concentrations, remuneration policies, governance 
and risk management) and enhances the third pillar (securitization and re-securitization 
exposures). The coming into force of Basel 2.5 was December 2011 in the European Union?’ 
and January 2013 in the United States (BCBS, 2015b). 


In December 2010, the Basel Committee published a new regulatory framework in order 
to enhance risk management, increase the stability of the financial markets and improve 
the banking industry’s ability to absorb macro-economic shocks. The Basel III framework 
consists of micro-prudential and macro-prudential regulation measures concerning; 


e anew definition of the risk-based capital; 
e the introduction of a leverage ratio; 


e the management of the liquidity risk. 


The capital is redefined as follows. Tier 1 capital is composed of common equity tier 1 
capital (common equity and retained earnings or CET1) and additional tier 1 capital (AT1). 
The new capital ratios are 4.5% for CET1, 6% for tier 1 and 8% for total capital (T1 + 
T2). Therefore, Basel III gives preference to tier 1 capital rather than tier 2 capital whereas 
the tier 3 risk capital is eliminated. BCBS (2010) introduced also a surplus of CET1, which 
is “designed to ensure that banks build up capital buffers outside periods of stress which 
can be drawn down as losses are incurred”. This capital conservation buffer (CB), which 
is equal to 2.5% of RWA, applies at all the times outside periods of stress. The aim is to 
reduce the distribution of earnings and to support the business of bank through periods 
of stress. A macro-prudential approach completes capital requirements by adding a second 
capital buffer called the countercyclical capital buffer (CCB). During periods of excessive 
credit growth, national authorities may require an additional capital charge between 0% 
and 2.5%, which increases the CET1 ratio until 9.5% (including the conservation buffer). 
The underlying idea is to smooth the credit cycle, to reduce the procyclicality and to help 
banks to provide credit during bad periods of economic growth. The implementation of 
this new framework is progressive from April 2013 until March 2019. A summary of capital 
requirements?’ and transitional periods is given in Table 1.5. 


This new definition of the capital is accompanied by a change of the required capital 
for counterparty credit risk (CCR). In particular, BCBS (2010) adds a credit valuation 


2° The Basel 2.5 framework was adopted in two stages: CRD II (or the 2009/111/EC directive) in November 
2009 and CRD III (or the 2010/76/EU directive) in December 2010. 

30Basel III defines a third capital buffer for systemic banks, which can vary between 1% and 3.5%. This 
topic will be presented later on the paragraph dedicated to systemically important financial institutions on 
page 26. 
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TABLE 1.5: Basel III capital requirements 


Capital ratio | 2013 ; 2014 , 2015 ` 2016 ` 2017 ` 2018 ; 2019 
CETI 3.5% | 4.0% | 4.5% | 4.5% 
CB , 0.625% ; 1.25% | 1.875% | 2.5% 

CET1 + CB | 3.5% ı 4.0% ı 4.5% ! 5.125% ! 5.75% ! 6.375% ı 7.0% 
Tier 1 4.5% | 5.5% | 6.0% | 6.0% 
Total 8.0% 1 8.0% 

Total + CB 8.0% | 8.625% | 9.25% | 9.875% | 10.5% 
CCB | ; 0% — 2.5% 


Source: Basel Committee on Banking Supervision, www. bis.org/bcbs/basel13.htm. 


adjustment charge (CVA) for OTC derivative trades. CVA is defined as the market risk of 
losses caused by changes in the credit spread of a counterparty due to changes in its credit 
quality. It also corresponds to the market value of counterparty credit risk. 


Basel III also includes a leverage ratio to prevent the build-up of excessive on- and 
off-balance sheet leverage in the banking sector. BCBS (2014a) defines this ratio as follows: 
Tier 1 ital 

ier 1 capital -gy 


Leverage ratio = ——— — 
8 Total exposures — 


where the total exposures is the sum of on-balance sheet exposures, derivative exposures and 
some adjustments concerning off-balance sheet items. The leverage ratio can be viewed as 
the second macro-prudential measure of Basel III. Indeed, during credit boom, we generally 
observe compression of risk weight assets and a growth of the leverage, because the number of 
profitable projects increases during economic good times. For instance, Brei and Gambacorta 
(2014) show that the Basel III leverage ratio is negatively correlated with GDP or credit 
growth. By introducing a floor value, the Basel Committee expects that the leverage ratio 
will help to reduce the procyclicality like the countercyclical capital buffer. 


The management of the liquidity is another important issue of Basel III. The bankruptcy 
of Lehman Brothers was followed by a lack of liquidity, which is one of the main sources 
of systemic risk. For instance, Brunnermeier and Pedersen (2009) demonstrated that a 
liquidity dry-up event arising from a fight-to-quality environment can result in runs, fire 
sales, and asset liquidations in general transforming the market into a contagion mechanism. 
In order to prevent such events, the Basel Committee proposed several liquidity rules and 
introduced in particular two liquidity ratios: the liquidity coverage ratio (LCR) and the net 
stable funding ratio (NSFR). The objective of the LCR is to promote short-term resilience 
of the bank’s liquidity risk profile. It is expressed as: 


HQLA 


L = 
ch Total net cash outflows 


> 100% 


where HQLA is the stock of high quality liquid assets and the denominator is the total net 
cash outflows over the next 30 calendar days. Therefore, the LCR is designed to ensure that 
the bank has the necessary assets to face a one-month stressed period of outflows. On the 
contrary, NSFR is designed in order to promote long-term resilience of the bank’s liquidity 
profile. It is defined as the amount of available stable funding (ASF) relative to the amount 
of required stable funding (RSF): 


Available amount of stable funding 


NSFR = > 100% 


Required amount of stable funding 
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The amount of available stable funding is equal to the regulatory capital?! plus the other 
liabilities to which we apply a scaling factor between 0% and 100%. The amount of required 
stable funding is the sum of two components: risk-weighted assets and off-balance sheet 
exposures. 


The implementation of Basel III was due to January 2013, but some countries have 
delayed the adoption of the full package. According to BCBS (2015b), the rules for risk-based 
capital are more adopted than those concerning the liquidity ratio or the leverage ratio. In 
the US, the rules for risk-based capital and the leverage ratio are effective since January 
2014, while the LCR rule came into effect in January 2015. In the European Union, the Basel 
III agreement is transposed on July 2013 into two texts: the CRD IV (or the 2013/36/EU 
directive) and the capital requirements regulation (CRR) (or the 575/2013 EU regulation). 
Therefore, Basel III is effective since January 2014 for the rules of risk-based capital and 
leverage ratio and October 2015 for the LCR rule. 


Even before Basel III is fully implemented, the Basel Committee has published a set of 
consultative documents, which has been viewed as the basis of a future Basel IV Accord. 
The guiding principle of these works is to simplify the different approaches to compute the 
regulatory capital and to reduce the risk of arbitrage between standardized and advanced 
methods. These new proposals concern review of the market risk measurement (BCBS, 
2013b, 2014h, 2016a), revision to the standardized approach for credit (BCBS, 2015d) and 
operational risks (BCBS, 2014f, 2016b), minimum capital requirements for interest rate risk 
in the banking book (BCBS, 2016d) and a modified framework for the CVA risk (BCBS, 
2015c). Finally, the Basel Committee created in 2017 a surprise by announcing that all 
these reforms correspond to the finalization of the Basel III Accord. The changes are very 
significant. For instance, it replaces the VaR measure by the expected shortfall measure. 
The risk weight of residential real estate exposures will depend on the loan-to-value (LTV) 
ratio. It also imposes some constraints on the use of internal credit risk models, in particular 
the remove of the IRB approach for bank, large corporate and equity exposures. CVA 
requirements will be based on two approaches: SA-CVA and BA-CVA. For counterparty 
credit risk, the IMM-CCR method will be constrained by a floor with respect to the SA- 
CCR, method. In the case of operational risk, the three approaches (BIA, TSA and AMA) 
are replaced by a unique approach called the Standardized Measurement Approach (SMA). 
For market risk, the boundary between trading book and banking book is changed, and the 
standard approach is fully revisited and is based on risk sensitivities. Finally, the interest 
rate risk of the banking book continues to be monitored in Pillar 2, but its measure is highly 
reinforced. 


1.2.2 Insurance regulation 


Contrary to the banking industry, the regulation in insurance is national. The Inter- 
national Association of Insurance Supervisors (IAIS) is an association to promote globally 
consistent supervision. For that, the IAIS is responsible for developing principles and stan- 
dards, which form the Insurance Core Principles (ICP). For instance, the last release of ICP 
was in November 2018 and contained 26 ICPs®*. However, its scope of intervention is more 
limited than this of the BCBS. In particular, the [AIS does not produce any methodologies 
of risk management or formula to compute risk-based capital. In Europe, the regulatory 
framework is the Solvency II directive (or the 2009/138/EC directive), which harmonizes 
the insurance regulation and capital requirements in the European Union. In the US, the 


31 Excluding tier 2 instruments with residual maturity of less than one year. 
32ICP 1 concerns the objectives, powers and responsibilities of the supervisor, ICP 17 is dedicated to 
capital adequacy, ICP 24 presents the macro-prudential surveillance and insurance supervision, etc. 
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supervisor is the National Association of Insurance Commissioners (NAIC). In 2008, it has 
created a Solvency Modernization Initiative (SMI) in order to reform the current framework 
in the spirit of Solvency II. However, the convergence across the different jurisdictions is far 
to being reached. 


Solvency I (or the 2002/13/EC directive) is a set of rules to define the insurance solvency 
regime and was put in place on January 2004 in the European Union. It defined how an 
insurance company should calculate its liabilities and the required capital. In this framework, 
the capital is the difference between the book value of assets and the technical provisions 
(or insurance liabilities). This capital is decomposed in the solvency capital requirement (or 
SCR) and the surplus (see Figure 1.8). One of the main drawbacks of Solvency I is that 
assets and liabilities are evaluated using an accounting approach (historical or amortized 


cost). 


Book Value 
of Assets 


FIGURE 1.8: Solvency I capital requirement 


In an address to the European Insurance Forum 2013, Matthew Elderfield, Deputy 
Governor of the Central Bank of Ireland, justifies the reform of the insurance regulation in 
Europe as follows: 


“[...) it is unacceptable that the common regulatory framework for insurance in 
Europe in the 21st-century is not risk-based and only takes account, very crudely, 
of one side of the balance sheet. The European Union urgently needs a new 
regulatory standard which differentiates solvency charges based on the inherent 
risk of different lines of business and which provides incentives for enhanced risk 
management. It urgently needs a framework that takes account of asset risks 
in an insurance company. It urgently needs a framework that encourages better 
governance and management of risk. And it urgently needs a framework that 
provides better disclosure to market participants” (Elderfield, 2013, page 1). 


With Solvency II, capital requirements are then based on an economic valuation of the 
insurer balance sheet, meaning that: 
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e assets are valued at their market value; 


e liabilities are valued on a best estimate basis. 


Market Value 


of Assets Risk Margin 


FIGURE 1.9: Solvency II capital requirement 


In this framework, the economic value of liabilities corresponds to the expected present 
value of the future cash flows. Technical provisions are then the sum of the liabilities best 
estimate and a risk margin (or prudence margin) in order to take into account non-hedgeable 
risk components. Solvency II defines two levels of capital requirements. The minimum cap- 
ital requirement (MCR) is the required capital under which risks are considered as be- 
ing unacceptable. The solvency capital requirement (SCR) is the targeted required capital 
(SCR > MCR). The underlying idea is to cover the different source of risk at a 99.5% 
confidence level?’ for a holding period of one year. The insurance company may opt for the 
standard formula or its own internal model for computing the required capital. In the case 
of the standard formula method, the SCR of the insurer is equal to: 


SCR = , | XC 9:3 -SCR;-SCR; + SCRor 


i,j 


where SCR; is the SCR of the risk module 7, SCRor is the SCR associated to the operational 
risk and p; j is the correlation factor between risk modules i and j. Solvency II considers 
several risk components: underwriting risk (non-life, life, health, etc.), market risk, default 
and counterpart credit risk®*. For each risk component, a formula is provided to compute 
the SCR of the risk factors. Regarding the capital C, own funds are classified into basic 
own funds and ancillary own funds. The basic own funds consist of the excess of assets over 


33It is set to 85% for the MCR. 
34Solvency II is an ambitious and complex framework because it mixes both assets and liabilities, risk 
management and ALM. 
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liabilities, and subordinated liabilities. The ancillary own funds correspond to other items 
which can be called up to absorb losses. Examples of ancillary own funds are unpaid share 
capital or letters of credit and guarantees. Own funds are then divided into tiers depending 
on their permanent availability and subordination. For instance, tier 1 corresponds to basic 
own funds which are immediately available and fully subordinated. The solvency ratio is 
then defined as: 


Solve Ratio = ~—— 
olvency Ratio SCR 


This solvency ratio must be larger than 33% for tier 1 and 100% for the total own funds. 


The quantitative approach to compute MCR, SCR and the technical provisions define 
Pillar 1 (Figure 1.9). As in Basel II framework, it is completed by two other pillars. Pillar 2 
corresponds to the governance of the solvency system and concerns qualitative requirements, 
rules for supervisors and own risk and solvency assessment (ORSA). Pillar 3 includes market 
disclosures and also supervisory reporting. 


1.2.3 Market regulation 


Banks and insurers are not the only financial institutions that are regulated and the 
financial regulatory framework does not reduce to Basel III and Solvency II. In fact, a whole 
variety of legislation measures helps to regulate the financial market and the participants. 


In Europe, the markets in financial instruments directive or MiFID?’ came in force 
since November 2007. Its goal was to establish a regulatory framework for the provision 
of investment services in financial instruments (such as brokerage, advice, dealing, portfo- 
lio management, underwriting, etc.) and for the operation of regulated markets by market 
operators. The scope of application concerns various aspects such as passporting, client 
categorization (retail/professional investor), pre-trade and post-trade transparency or best 
execution procedures. In August 2012, MiFID is completed by the European market in- 
frastructure regulation (EMIR), which is specifically designed to increase the stability of 
OTC derivative markets by promoting central counterparty clearing and trade reposito- 
ries. In June 2014, MiFID is revised (MiFID 2) and the regulation on markets in financial 
instruments (MiFIR) replaces EMIR. According to ESMA*°, this supervisory framework 
concerns 104 European regulated markets at the date of May 2015. On April 2014, the 
European parliament completes the framework by publishing new rules to protect retail in- 
vestors (packaged retail and insurance-based investment products or PRIIPS). These rules 
complete the various UCITS directives, which organize the distribution of mutual funds in 
Europe. 


In the US, the regulation of the market dates back to the 1930s: 


e The Securities Act of 1933 concerns the distribution of new securities. 


e The Securities Exchange Act of 1934 regulates trading securities, brokers, and ex- 
changes, whereas the Commodity Exchange Act regulates the trading of commodity 
futures. 


e The Trust Indenture Act of 1939 defines the regulating rules for debt securities. 


e The Investment Company Act of 1940 is the initial regulation framework of mutual 
funds. 


e The Investment Advisers Act of 1940 is dedicated to investment advisers. 


351t corresponds to the 2004/39/EC directive. 
36 See the website www.esma.europa.eu/databases-library/registers-and-data. 
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At the same time, the Securities and Exchange Commission (SEC) was created to monitor 
financial markets (stocks and bonds). Now, the area of SEC supervision is enlarged and con- 
cerns stock exchanges, brokers, mutual funds, investment advisors, some hedge funds, etc. 
In 1974, the Commodities Futures Trading Commission Act established the Commodity Fu- 
tures Trading Commission (CFTC) as the supervisory agency responsible for regulating the 
trading of futures contracts. The market regulation in the US has not changed significantly 
until the 2008 Global Financial Crisis (GFC). In 2010, President Barack Obama signed 
an ambitious federal law, the Dodd-Frank Wall Street Reform and Consumer Protection 
Act also named more simply Dodd-Frank, which is viewed as a response to the crisis. This 
text has an important impact on various areas of regulation (banking, market, investors, 
asset managers, etc.). It also introduces a new dimension in regulation. It concerns the co- 
ordination among regulators with the creation of the Financial Stability Oversight Council 
(FSOC), whose goal is to monitor the systemic risk. 


1.2.4 Systemic risk 


The 2008 financial crisis has an unprecedent impact on the financial regulation. It was 
responsible for Basel III, Dodd-Frank, Volcker rule, etc., but it has also inspired new consid- 
erations on the systemic risk. Indeed, the creation of the Financial Stability Board (FSB) 
in April 2009 was motivated to establish an international body that monitors and makes 
recommendations about the global financial system, and especially the associated systemic 
risk. Its area of intervention covers not only banking and insurance, but also all the other 
financial institutions including asset managers, finance companies, market intermediaries, 
investors, etc. 


The main task of the FSB is to develop assessment methodologies for defining sys- 
temically important financial institutions (SIFIs) and to make policy recommendations for 
mitigating the systemic risk of the financial system. According to FSB (2010), SIFIs are 
institutions whose “distress or disorderly failure, because of their size, complexity and sys- 
temic interconnectedness, would cause significant disruption to the wider financial system 
and economic activity”. By monitoring SIFIs in a different way than other financial institu- 
tions, the objective of the supervisory authorities is obviously to address the ‘too big too fail’ 
problem. A SIFI can be global (G-SIFI) or domestic (D-SIFI). The FSB also distinguishes 
between three types of G-SIFIs: 


1. G-SIBs correspond to global systemically important banks. 
2. G-SIIs designate global systemically important insurers. 


3. The third category is defined with respect to the two previous ones. It incorporates 
other SIFIs than banks and insurers (non-bank non-insurer global systemically im- 
portant financial institutions or NBNI G-SIFIs). 


The FSB/BCBS framework for identifying G-SIBs is a scoring system based on five cat- 
egories: size, interconnectedness, substitutability/financial institution infrastructure, com- 
plexity and cross-jurisdictional activity (BCBS, 2014g). In November 2018, there were 29 
G-SIBs (FSB, 2015b). Depending on the score value, the bank is then assigned to a spe- 
cific bucket, which is used to calculate the higher loss absorbency (HLA) requirement. This 
additional capital requirement is part of the Basel III framework and ranges from 1% to 
3.5% common equity tier 1. According to FSB (2018b), the most systemically important 
bank is JPMorgan Chase, which is assigned to an additional capital buffer of 2.5% CET1. 
This means that the total capital for this banks can go up to 15.5% with the following 
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decomposition: tier 1 = 6.0%, tier 2 = 2.0%, conservation buffer = 2.5%, countercyclical 
buffer = 2.5% and systemic risk capital = 2.5%. 


For insurers, the assessment methodology is close to the methodology for G-SIBs and is 
based on five categories: size, global activity, interconnectedness, non-traditional insurance 
and non-insurance activities and substitutability (LIAIS, 2013a). However, this quantitative 
approach is completed by a qualitative analysis and the final list of G-SIIs is the result 
of the IAIS supervisory judgment. In November 2015, there were 9 G-SIIs (FSB, 2015c). 
The associated policy measures are documented in IAIS (2013b) and consist of three main 
axes: recovery and resolution planning requirements, enhanced supervision and higher loss 
absorbency requirements. 


Concerning NBNI SIFIs, FSB and IOSCO are still in a consultation process in order to 
finalize the assessment methodologies (FSB, 2015a). Indeed, the second consultation paper 
considers three categories of participants in the financial sectors that it identifies as potential 
NBNI SIFIs: 


1. finance companies; 
2. market intermediaries, especially securities broker-dealers; 
3. investment funds, asset managers and hedge funds. 


The final assessment methodology was planned for the end of 2015, but it has never been 
published until now. However, the fact that the FSB already considers that there are other 
SIFIs than banks and insurers suggests that financial regulation will be strengthened for 
many financial institutions including the three previous categories but also other financial 
institutions such as pension funds, sovereign wealth funds, etc. 


The identification of SIFIs is not the only task of the FSB. The other important objective 
is to monitor the shadow banking system and to understand how it can pose systemic risk. 
The shadow banking system can be described as “credit intermediation involving entities and 
activities outside the regular banking system” (FSB, 2011). It is also called non-bank credit 
intermediation. The shadow banking system may expose the traditional banking system to 
systemic risk, because they may be spill-over effects between the two systems. Moreover, 
shadow banking entities (SBEs) are not subject to tight regulation like banks. However, it 
runs bank-like activities such as maturity transformation, liquidity transformation, leverage 
and credit risk transfer. Examples of shadow banking are for instance money market funds, 
securitization, securities lending, repos, etc. The task force formed by the FSB follows a 
three-step process: 


e the first step is to scan and map the overall shadow banking system and to understand 
its risks; 
e the second step is to identify the aspects of the shadow banking system posing systemic 


risk or regulatory arbitrage concerns; 


e the last step is to assess the potential impact of systemic risk induced by the shadow 
banking system. 


Even if this process is ongoing, shadow banking regulation can be found in Dodd-Frank or 
2015 consultation paper of the EBA. However, until now regulation is principally focused 
on money market funds. 
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1.3 Financial regulation overview 
1.3.1 List of supervisory authorities 


We use the following correspondence: B for banking supervision, I for insurance super- 
vision, M for market supervision and S for systemic risk supervision. 


International authorities 


BCBS Basel Committee on Banking Supervision; www.bis.org/bcbs; B 

FSB Financial Stability Board; www.fsb.org; S 

IAIS International Association of Insurance Supervisors; www.iaisweb.org; I 
IOSCO International Organization of Securities Commissions; www.iosco.org; M 


European authorities 


EBA European Banking Authority; eba.europa.eu; B 

ECB/SSM European Central Bank/Single Supervisory Mechanism; www.bankingsupervi 
sion.europa.eu; B 

EIOPA European Insurance and Occupational Pensions Authority; eiopa.europa. eu; 
I 

ESMA European Securities and Markets Authority; www.esma.europa.eu; M 

ESRB European Systemic Risk Board; www.esrb.europa.eu; S 


US authorities 
CFTC Commodity Futures Trading Commission; www.cftc.gov; M 


FRB Federal Reserve Board; www.federalreserve.gov/supervisionreg.htm; B/S 
FDIC Federal Deposit Insurance Corporation; www.fdic.gov; B 
FIO Federal Insurance Office; home. treasury. gov/policy-issues/financial-ma 


rkets-financial-institutions-and-fiscal-service/federal-insuranc 
e-office; I 

FSOC Financial Stability Oversight Council; home.treasury.gov/policy-issues/ 
financial-markets-financial-institutions-and-fiscal-service/fsoc 


S 
OCC Office of the Comptroller of the Currency; www.occ.gov; B 
SEC Securities and Exchange Commission; www.sec.gov; M 


Some national authorities 


Canada 

CSA Canadian Securities Administrators; www.securities-administrators.ca; M 

OSFI Office of the Superintendent of Financial Institutions; www.osfi-bsif .gc.ca; 
B/I 

ITROC Investment Industry Regulatory Organization of Canada; www.iiroc.ca; M 

China 

CBRC China Banking Regulatory Commission; www.cbrc.gov.cn; B 

CIRC China Insurance Regulatory Commission; www.circ.gov.cn; I 


CSRC China Securities Regulatory Commission; www.csrc.gov.cn; M 
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France 

AMF Autorité des Marchés Financiers; www.amf-france.org; M 

ACPR Autorité de Contrôle Prudentiel et de Résolution; acpr.banque-france. fr; 
B/I 

Germany 


BAFIN Bundesanstalt fiir Finanzdienstleistungsaufsicht; www.bafin.de; B/I/M 


Italy 

Bdl Banca d'Italia; www. bancaditalia.it; B 

CONSOB Commissione Nazionale per le Società e la Borsa; www.consob.it; M 
IVASS Istituto per la Vigilanza sulle Assicurazioni; www.ivass.it; I 

Japan 

FSA Financial Services Agency; www.fsa.go.jp; B/I/M 

Luxembourg 

CAA Commissariat aux Assurances; www.caa.1u; I 

CSSF Commission de Surveillance du Secteur Financier; www.cssf.1lu; B/M 
Spain 

BdE Banco de Espana; www.bde.es; B 

CNMV Comisión Nacional del Mercado de Valores; www.cnmv.es; M 

DGS Dirección General de Seguros y Pensiones; www.dgsfp.mineco.es; I 
Switzerland 


FINMA Swiss Financial Market Supervisory Authority; www.finma.ch; B/I/M 


United Kingdom 


FCA Financial Conduct Authority; www.fca.org.uk; M 
PRA Prudential Regulation Authority; www. bankofengland.co.uk/prudential-r 
egulation; B/I 


1.3.2 Timeline of financial regulation 


In this section, we give the major dates which marketed the important stages of the 
financial regulation. We can consider four periods: before 1980, the years 1980 — 2000, the 
period until the 2008 Global Financial Crisis and the last 10 years. 


Before 1980 


Before 1980, the financial regulation is mainly developed in the US with several acts, 
which are voted in after the Great Depression in the 1930s. These acts concerns a wide 
range of financial activities, in particular banking, markets and investment sectors. The 
Basel Committee on Banking Supervision was established in 1974. In Europe, two directives 
established a regulatory framework for insurance companies. 
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1913 Federal Reserve Act (establishment of the Federal Reserve 
System as the central banking system of the US) 
Banking 1933 Glass-Steagall Act (separation of commercial and invest- 
Regulation ment banking in the US) 
1933 US Banking Act (creation of FDIC and insurance deposit) 
BCBS 1974 Creation of the Basel Committee on Banking Supervision 
1973-07-24 Publication of the non-life insurance directive (73/239/ 
Skener EEC) dedicated to solvency margin requirements 
1979-03-05 Publication of the life insurance directive (79/267 /EEC) 
dedicated to solvency margin requirements 
1933-05-27 Securities Act (registration and prospectus of securities) 
1934-06-06 Securities Exchange Act (regulation of the secondary mar- 
kets and creation of the SEC) 
1936-06-15 Commodity Exchange Act (regulation of the commodity 
Market futures) 
Regulation 1939-08-03 Trust Indenture Act (regulation of debt securities) 
1940-08-22 Investment Advisers Act (regulation of investment advisers) 
1940-08-22 Investment Company Act (regulation of mutual funds) 
1974-10-23 Commodity Futures Trading Commission Act (the CFTC 


replaces the Commodity Exchange Commission) 


The years 1980 — 2000 


The years 1980 — 2000 were marked by the development of the banking regulation and 
the publication of the Basel Accord dedicated to credit risk. Moreover, the end of the 1990s 
saw the implementation of the regulatory framework concerning market risks. In Europe, 
the UCITS directive is also an important step concerning the investment industry. In the 
US, the insurance regulation is reformed with the risk-based capital framework whereas 
Solvency I is reinforced in Europe. 


1987-12-15 Publication of the consultative paper on the Cooke ratio 
Basel I 1988-07-04 Publication of the Basel Capital Accord 

1996-01-18 Publication of the amendment to incorporate market risks 

1993-03-15 Publication of the Capital Adequacy Directive (93/6/EEC) 
CAD known as CAD I 

1998-06-22 Revision of the CAD (98/31/EEC) known as CAD II 

1988-06-22 Second non-life insurance directive 88/357 /EEC 
Sng 1990-11-08 Second life insurance directive 90/619/EEC 

1992-06-18 Third non-life insurance directive 92/49/EEC 

1992-11-10 Third life insurance directive 92/96/EEC 

1990 NAIC created the US RBC regime 

1992 Implementation of RBC in US insurance 
RBC 1993 Finalization of the RBC formula for life insurance 

1994 Finalization of the RBC formula for property and casuality 

insurance 

1998 Finalization of the RBC formula for health insurance 

Market 1985-12-20 Publication of the first UCITS Directive (85/611/EEC) 
; 2000-12-14 Commodity Futures Modernization Act (regulation of OTC 

Regulation 


derivatives in the US) 
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The years 2000 — 2008 


In the 2000s, banks and regulators have invested significant effort and resources to put 
in place the Basel II framework. This is during this period that modern risk management 
was significantly developed in the banking sector. The Solvency II reform emerged in 2004 
and intensive work was underway to calibrate this new proposition on insurance regulation. 


1999-06-02 Publication of the first CP on Basel II 

2001-01-29 Publication of the second CP on Basel II 

2001-11-05 Results of the QIS 2 

2002-06-25 Results of the QIS 2.5 

2003-04-29 Publication of the third CP on Basel II 

2003-05-05 Results of the QIS 3 
Basel II 2004-06-10 Publication of the Basel II Accord 
2004-2005 Conduct of QIS 4 (national impact study and tests) 
2005-07-30 Publication of “The Application of Basel II to Trading Ac- 

tivities and the Treatment of Double Default Effects” 
2006-06-16 Results of the QIS 5 
2006-06-30 Publication of the Basel II Comprehensive Version (including 

Basel I, Basel IT and 2005 revisions) 
2006-05-14 Publication of the directive 2006/48/EC 


CRE 2006-05-14 Publication of the directive 2006/49/EC (CRD) 

2002-03-05 Non-life insurance directive 2002/13/EC (revision of sol- 
Solvency I vency margin requirements) 

2002-11-05 Life insurance recast directive 2002/83/EC 

2004 Initial works on Solvency II 

2006-03-17 Report on the first QIS 
POEEM 2007 EEA on the e QIS 


2007-11-01 Report on the third QIS 

2002-01-22 Publication of the directives 2001/107/EC and 2001/108/EC 
(UCITS III) 

2004-04-21 Publication of the directive 2004/39/EC (MiFID 1) 


Market 
Regulation 


The years 2008 — 2019 


The 2008 Global Financial Crisis completely changed the landscape of financial reg- 
ulation. Under political pressures, we assist to a frenetic race of regulatory reforms. For 
instance, the Basel Committee had published 21 regulatory standards before 2007. From 
January 2008 to December 2014, this number has dramatically increased with 34 new reg- 
ulatory standards. With Basel 2.5, new capital requirements are put in place for market 
risk. The Basel III framework is published at the end of 2010 and introduces new stan- 
dards for managing the liquidity risk. However, the finalized version of Basel III reforms 
will be only published in 2017. In Europe, market regulation is the new hot topic for regula- 
tors. However, the major event of the beginning of this decade concerns systemic risk. New 
regulations have emerged and new financial activities are under scrutiny (shadow banking 
system, market infrastructures, investment management). 
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Basel 2.5 


2007-10-12 
2008-07-22 
2009-07-13 


Publication of the first CP on the incremental risk charge 
Proposed revisions to the Basel II market risk framework 
Publication of the final version of Basel 2.5 


Basel III 


2010-12-16 
2011-06-01 


2013-01-07 
2013-10-31 
2013-12-13 
2014-01-12 
2014-03-31 
2014-04-10 


2014-04-15 
2014-10-31 
2016-04-21 
2016-07-11 


2017-12-07 
2019-01-14 


Publication of the original version of Basel III 

Revised version of the Basel III capital rules reflecting the 
CVA modification 

Publication of the rules concerning the liquidity coverage ratio 
Fundamental review of the trading book (FRTB) 

Capital requirements for banks’ equity investments in funds 

Publication of the leverage ratio 

Publication of SA-CCR 

Capital requirements for bank exposures to central counter- 
parties 

Supervisory framework for measuring and controlling large 
exposures 

Publication of the rules concerning the net stable funding ra- 
tio 

Interest rate risk in the banking book (IRRBB) 

Revisions to the securitization framework 

Final version of Basel III reforms 

Publication of the Basel III comprehensive version for market 
risk 


CRD/CRR 


2009-09-16 
2010-09-24 
2013-06-26 
2013-06-26 


2013-10-15 


2014-10-10 


2017-12-12 
2019 


Directive 2009/111/EC (CRD II) 

Directive 2010/76/EU (CRD III) 

Directive 2013/36/EU (CRD IV) 

Publication of the capital requirements regulation 575/2013 
(CRR) 

Council regulation 1024/2013 concerning the European Cen- 
tral Bank and the prudential supervision 

Commission delegated regulation 2015/62 of on the leverage 
ratio 

Regulation 2017/2401 on securitizations 

Publication of CRD V & CRR 2 


Solvency II 


2008-11-19 
2009-11-25 
2011-03-14 
2014-04-16 
2015-10-10 
2015-12-02 


Report on the fourth QIS 

Solvency II directive 2009/138/EC 

Report on the fifth QIS 

Publication of the Omnibus II directive 2014/51/UE 
Publication of the commission delegated regulation 2015/35 
Commission implementing regulation 2015/2450 


Market 
Regulation 


2009-07-13 
2010-06-08 
2012-07-04 
2014-05-15 
2012-05-15 
2014-07-23 
2014-11-26 
2015-11-25 
2016-06-08 
2017-06-14 


Directive 2009/65/EC (UCITS IV) 

AIFM directive (2011/61/EU) 

EU regulation 648/2012 (EMIR) 

Directive 2014/65/EU (MiFID II) 

EU regulation 600/2014 (MiFIR) 

Directive 2014/91/EU (UCITS V) 

EU regulation 1286/2014 (PRIIPS) 

EU regulation 2015/2365 on securities financing transactions 
EU regulation 2016/1011 on indices and benchmarks 

EU regulation 2017/1131 on money market funds 


Continued on next page 
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Systemic 
Risk 


2009-04 
2010-07-21 


2010-07-21 
2011-11-04 
2013-07-03 
2015-03-04 


Creation of the Financial Stability Board (FSB) 
Dodd-Frank Wall Street Reform and Consumer Protection 
Act 

Volcker Rule (§619 of the Dodd-Frank Act) 

Publication of the G-SIB assessment methodology (BCBS) 
Update of the G-SIB assessment methodology (BCBS) 
Second CP on assessment methodologies for identifying 
NBNI-SIFIs (FSB-IOSCO) 


Taylor & Francis 
Taylor & Francis Group 


http://taylorandfrancis.com 


Part I 


Risk Management in the 
Financial Sector 


Taylor & Francis 
Taylor & Francis Group 


http://taylorandfrancis.com 


Chapter 2 


Market Risk 


This chapter begins with the presentation of the regulatory framework. It will help us to 
understand how the supervision on market risk is organized and how the capital charge is 
computed. Then we will study the different statistical approaches to measure the value-at- 
risk and the expected shortfall. Specifically, a section is dedicated to the risk management of 
derivatives and exotic products. We will see the main concepts, but we will present the more 
technical details later in Chapter 9 dedicated to model risk. Advanced topics like Monte 
Carlo methods and stress testing models will also be addressed in Part II. Finally, the last 
part of the chapter is dedicated to risk allocation. 


2.1 Regulatory framework 


We recall that the original Basel Accord only concerned credit risk in 1988. However, the 
occurrences of market shocks were more important and the rapid development of derivatives 
created some stress events at the end of the eighties and the beginning of the nineties. On 19 
October 1987, stock markets crashed and the Dow Jones Industrial Average index dropped 
by more than 20% in the day. In 1990, the collapse of the Japanese asset price bubble (both 
in stock and real estate markets) caused a lot of damage in the Japanese banking system 
and economy. The unexpected rise of US interest rates in 1994 resulted in a bond market 
massacre and difficulties for banks, hedge funds and money managers. In 1994-1995, several 
financial disasters occurred, in particular the bankruptcy of Barings and the Orange County 
affair (Jorion, 2007). 

In April 1993, the Basel Committee published a first consultative paper to incorporate 
market risk in the Cooke ratio. Two years later, in April 1995, it accepted the idea to 
compute the capital charge for market risks with an internal model. This decision is mainly 
due to the publication of RiskMetrics by J.P. Morgan in October 1994. Finally, the Basel 
Committee published the amendment to the capital accord to incorporate market risks in 
January 1996. This proposal has remained the supervisory framework for market risk during 
many years. However, the 2008 Global Financial Crisis had a big impact in terms of market 
risk. Just after the crisis, a new approach called Basel 2.5 has been accepted. In 2012, the 
Basel Committee launched a major project: the fundamental review of the trading book 
(FRTB). These works resulted in the publication of a new comprehensive framework in 
January 2019 (BCBS, 2019). This is the Basel III framework for computing the minimum 
capital requirements for market risk as of January 2022. 

According to BCBS (2019), market risk is defined as “the risk of losses (in on- and 
off-balance sheet positions) arising from movements in market prices. The risks subject to 
market risk capital requirements include but are not limited to: 

e default risk, interest rate risk, credit spread risk, equity risk, foreign exchange (FX) 

risk and commodities risk for trading book instruments; 


e FX risk and commodities risk for banking book instruments.” 
37 
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The following table summarizes the perimeter of markets risks that require regulatory cap- 
ital: 


Portfolio Fixed Income Equity Currency Commodity Credit 
Trading V v V V V 
Banking v Vv 


The Basel Committee makes the distinction between the trading book and the banking 
book. Instruments to be included in the trading book are subject to market risk capital 
requirements, while instruments to be included in the banking book are subject to credit risk 
capital requirements (with the exception of foreign exchange and commodity instruments). 
The trading book refers to positions in assets held with trading intent or for hedging other 
elements of the trading book. These assets are systematically valuated on a fair value (mark- 
to-market or mark-to-model) basis, are actively managed and their holding is intentionally 
for short-term resale. Examples are proprietary trading, market-making activities, hedging 
portfolios of derivatives products, listed equities, repo transactions, etc. The banking book 
refers to positions in assets that are expected to be held until the maturity. These assets 
may be valuated at their historic cost or with a fair value approach. Examples are unlisted 
equities, real estate holdings, hedge funds, etc. 


The first task of the bank is therefore to define trading book assets and banking book 
assets. For instance, if the bank sells an option on the Libor rate to a client, a capital 
charge for the market risk is required. If the bank provides a personal loan to a client with 
a fixed interest rate, there is a market risk if the interest rate risk is not hedged. However, a 
capital charge is not required in this case, because the exposure concerns the banking book. 
Exposures on stocks may be included in the banking book if the objective is a long-term 
investment. 


2.1.1 The Basel I/II framework 


To compute the capital charge, banks have the choice between two approaches: 
1. the standardized measurement method (SMM); 
2. the internal model-based approach (IMA). 


The standardized measurement method has been implemented by banks at the end of 
the nineties. However, banks quickly realized that they can sharply reduce their capital 
requirements by adopting internal models. This explained that SMM was only used by a 
few number of small banks in the 2000s. 


2.1.1.1 Standardized measurement method 


Five main risk categories are identified: interest rate risk, equity risk, currency risk, 
commodity risk and price risk on options and derivatives. For each category, a capital 
charge is computed to cover the general market risk, but also the specific risk. According 
to the Basel Committee, specific risk includes the risk “that an individual debt or equity 
security moves by more or less than the general market in day-to-day trading and event risk 
(e.g. takeover risk or default risk)”. The use of internal models is subject to the approval 
of the supervisor and the bank can mix the two approaches under some conditions. For 
instance, the bank may use SMM for the specific risk and IMA for the general market risk. 
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In this approach, the capital charge K is equal to the risk exposure E times the capital 
charge weight K: 
K=E-K 


For the specific risk, the risk exposure corresponds to the notional of the instrument, whether 
it is a long or a short position. For the general market risk, long and short positions on 
different instruments can be offset. In what follows, we give the main guidelines and we 
invite the reader to consult BCBS (1996a, 2006) to obtain the computational details. 


Interest rate risk Let us first consider the specific risk. The Basel Committee makes the 
distinction between sovereign and other fixed income instruments. In the case of government 
instruments, the capital charge weights are: 


AAA | A+ | BB+ | Below 
Rating to | to | to | B- | NR 
AAR) BBB- =) | 
Maturity Í 0—6M 6M—2Y 2Y + I | | 
K 0% '0.25% 1.00% 1.60%! 8% | 12% !8% 


This capital charge depends on the rating and also the residual maturity for A+ to BBB— 
issuers’. The category NR stands for non-rated issuers. In the case of other instruments 
issued by public sector entities, banks and corporate companies, the capital charge weights 
are: 


AAA , BB+ , Below i 
Rating to | to BB— | NR 
BBB— BB- 
Maturity | 0—6M 6M—-2Y 2Y+ , i l 
K 0.25% 1.00% 1.60%! 8% | 12% | 8% 


Example 4 We consider a trading portfolio with the following exposures: a long position 
of $50 mn on Euro-Bund futures, a short position of $100 mn on three-month T-Bills and 
a long position of $10 mn on an investment grade (IG) corporate bond with a three-year 
residual maturity. 


The underlying asset of Euro-Bund futures is a German bond with a long maturity 
(higher than 6 years). We deduce that the capital charge for specific risk for the two sovereign 
exposures is equal to zero, because both Germany and US are rated above A+. Concerning 
the corporate bond, we obtain: 


K = 10 x 1.60% = $160 000 


For the general market risk, the bank has the choice between two methods: the maturity 
approach and the duration approach. In the maturity approach, long and short positions are 
slotted into a maturity-based ladder comprising fifteen time-bands (less than one month, 
between one and three months, ... between 12 and 20 years, greater than 20 years). The 
risk weights depend on the time band and the value of the coupon”, and apply to the 
net exposure on each time band. For example, a capital charge of 8% is used for the net 


1Three maturity periods are defined: 6 months or less, greater than 6 months and up to 24 months, more 
than 24 months. 

?We distinguish coupons less than 3% (small coupons or SC) and coupons 3% or more (big coupons or 
BC). 
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exposure of instruments (with small coupons), whose maturity is between 12 and 20 years. 
For reflecting basis and gap risks, the bank must also include a 10% capital charge to 
the smallest exposure of the matched positions. This adjustment is called the ‘vertical 
disallowance’. The Basel Committee considers a second adjustment for horizontal offsetting 
(the ‘horizontal disallowance’). For that, it defines 3 zones (less than 1 year, one year to 
four years and more than four years). The offsetting can be done within and between the 
zones. The adjustment coefficients are 30% within the zones 2 and 3, 40% within the zone 
1, between the zones 1 and 2, and between the zones 2 and 3, and 100% between the zones 
1 and 3. Therefore, the regulatory capital for the general market risk is the sum of the three 


components: 

K = KO? + KVP + CBP 
where COP, KVP and KE are the required capital for the overall net open position, the 
vertical disallowance and the horizontal disallowance. 


With the duration approach, the bank computes the price sensitivity of each position 
with respect to a change in yield Ay, slots the sensitivities into a duration-based ladder 
and applies adjustments for vertical and horizontal disallowances. The computation of the 
required capital is exactly the same as previously, but with a different definition of time 
bands and zones. 


Equity risk For equity exposures, the capital charge for specific risk is 4% if the portfolio 
is liquid and well-diversified and 8% otherwise. For the general market risk, the risk weight 
is equal to 8% and applies to the net exposure. 


Example 5 We consider a $100 mn short exposure on the S&P 500 index futures contract 
and a $60 mn long exposure on the Apple stock. 


The capital charge for specific risk is*: 


icSpecific _ 100 x 4% + 60 x 8% 
= 444.8 
8.8 
The net exposure is —$40 mn. We deduce that the capital charge for the general market 
risk is: 


jcGeneral = |—40| x 8% 
3.2 


It follows that the total capital charge for this equity portfolio is $12 mn. 


Remark 1 Under Basel 2.5, the capital charge for specific risk is set to 8% whatever the 
liquidity of the portfolio. 


Foreign exchange risk The Basel Committee includes gold in this category and not in 
the commodity category because of its specificity in terms of volatility and its status of 
safe-heaven currency. The bank has first to calculate the net position (long or short) of each 
currency. The capital charge is then 8% of the global net position defined as the sum of: 


3We assume that the S&P 500 index is liquid and well-diversified, whereas the exposure on the Apple 
stock is not diversified. 
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e the maximum between the aggregated value £px of long positions and the aggregated 
value Spx of short positions and, 


e the absolute value of the net position MGoig in gold. 


We have: 
K = 8% x (max (Lex, Spx) + Weoral) 


Example 6 We consider a bank which has the following long and short positions expressed 
in $ mn}: 


Currency EUR JPY GBP CHF CAD AUD ZAR Gold 
fi “170 0 25 37 1i 3 8 3o 
Si 80 50 12 9 28 0 8 6 


We first compute the net exposure M; for each currency: 
Ni = Li- Si 
We obtain the following figures: 


Currency EUR JPY GBP CHF CAD AUD ZAR Gold 
ON; -90 —50 13 28 -17 3 0 X 


We then calculate the aggregated long and short positions: 


Lex = 90+13+28+ 3+0 = 134 
Spx = 504+17=67 
Neola = 27 


We finally deduce that the capital charge is equal to $12.88 mn: 


K = 8% x (max (134,67) + |27|) 
= 8%~x 161 
12.88 


Commodity risk Commodity risk concerns both physical and derivative positions (for- 
ward, futures” and options). This includes energy products (oil, gas, ethanol, etc.), agricul- 
tural products (grains, oilseeds, fiber, livestock, etc.) and metals (industrial and precious), 
but excludes gold which is covered under foreign exchange risk. The Basel Committee makes 
the distinction between the risk of spot or physical trading, which is mainly affected by the 
directional risk and the risk of derivative trading, which includes the directional risk, the 
basis risk, the cost-of-carry and the forward gap (or time spread) risk. The SMM for com- 
modity risk includes two options: the simplified approach and the maturity ladder approach. 


Under the simplified approach, the capital charge for directional risk is 15% of the 
absolute value of the net position in each commodity. For the other three risks, the capital 
charge is equal to 3% of the global gross position. We have: 


K = 15% x X` |£; — Si] + 3% x X (£i + Si) 


i=l i=l 


4We implicity assume that the reporting currency of the bank is the US dollar. 

5The most traded futures contracts are crude oil, brent, heating oil, gas oil, natural oil, rbob gasoline 
silver, platinum, palladium, zinc, lead, aluminium, cocoa, soybeans, corn, cotton, wheat, sugar, live cattle, 
coffee and soybean oil. 
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where m is the number of commodities, £; is the long position on commodity i and S; is 
the short position on commodity i. 


Example 7 We consider a portfolio of five commodities. The mark-to-market exposures 
expressed in $ mn are the following: 


Commodity Crude Oil Coffee Natural Gas Cotton Sugar 


L; 23 5 3 8 11 
Si 0 0 19 2 6 


The aggregated net exposure eae |£; — S;| is equal to $55 mn whereas the gross 
exposure ye (Li + S:) is equal to $77 mn. We deduce that the required capital is 
15% x 55 + 3% x 77 or $10.56 mn. 

Under the maturity ladder approach, the bank should spread long and short exposures of 
each commodity to seven time bands: 0-1M, 1M-3M, 3M-6M, 6M-1Y, 1Y-2Y, 2Y-3Y, 3Y-+. 
For each time band, the capital charge for the basis risk is equal to 1.5% of the matched 
positions (long and short). Nevertheless, the residual net position of previous time bands 
may be carried forward to offset exposures in next time bands. In this case, a surcharge 
of 0.6% of the residual net position is added at each time band to cover the time spread 
risk. Finally, a capital charge of 15% is applied to the global net exposure (or the residual 
unmatched position) for directional risk. 


Option’s market risk There are three approaches for the treatment of options and 
derivatives. The first method, called the simplified approach, consists of calculating sepa- 
rately the capital charge of the position for the option and the associated underlying. In the 
case of an hedged exposure (long cash and long put, short cash and long call), the required 
capital is the standard capital charge of the cash exposure less the amount of the in-the- 
money option. In the case of a non-hedged exposure, the required capital is the minimum 
value between the mark-to-market of the option and the standard capital charge for the 
underlying. 


Example 8 We consider a variant of Example 5. We have a $100 mn short exposure on the 
S@P 500 index futures contract and a $60 mn long exposure on the Apple stock. We assume 
that the current stock price of Apple is $120. Six months ago, we have bought 400000 put 
options on Apple with a strike of $130 and a one-year maturity. We also decide to buy 10 000 
ATM call options on Google. The current stock price of Google is $540 and the market value 
of the option is $45.5. 


We deduce that we have 500000 shares of the Apple stock. This implies that $48 mn 
of the long exposure on Apple is hedged by the put options. Concerning the derivative 
exposure on Google, the market value is equal to $0.455 mn. We can therefore decompose 
this portfolio into three main exposures: 


e a directional exposure composed by the $100 mn short exposure on the S&P 500 index 
and the $12 mn remaining long exposure on the Apple stock; 


e a $48 mn hedged exposure on the Apple stock; 


e a $0.455 mn derivative exposure on the Google stock. 
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For the directional exposure, we compute the capital charge for specific and general market 
risks®: 
K = (100 x 4% + 12 x 8%) + 88 x 8% 
= 4.96 + 7.04 
= 12 


For the hedged exposure, we proceed as previously but we deduce the in-the-money value’: 


K = 48x (8%+8%) —4 
= 3.68 


The market value of the Google options is $0.455 mn. We compare this value to the standard 
capital charge® to determine the capital charge: 


K = min(5.4 x 16%, 0.455) 
= 0.455 


We finally deduce that the required capital is $16.135 mn. 


The second approach is the delta-plus method. In this case, the directional exposure 
of the option is calculated by its delta. Banks will also required to compute an additional 
capital charge for gamma and vega risks. We consider different options and we note j € A; 
when the option j is written on the underlying asset i. We first compute the (signed) capital 
charge for the 4 risks at the asset level: 


| = 5y N; r A; hes co 
JEAI 
Keera = 5 N; . A; . S; . Koos! 
JEAI 
1 2 
Gamma Gamma 
KS So XO NT; | (S K; ) 
jEAi 
KY = X. NG - vj: (25% X) 
JEAi 


where S; is the current market value of the asset i, K;?°""S and KSereral are the corre- 
sponding standard capital charge for specific and general market risk and KG?™™ is the 
capital charge for gamma impact’. Here, N;, A;, Tj and v; are the exposure, delta, gamma 
and vega of the option j. For the vega risk, the shift corresponds to +25% of the implied 
volatility }X;. For a portfolio of assets, the traditional netting rules apply to specific and 
general market risks. The total capital charge for gamma risk corresponds to the opposite 
of the sum of the negative individual capital charges for gamma risk whereas the total cap- 
ital charge for vega risk corresponds to the sum of the absolute value of individual capital 
charges for vega risk. 


6The net short exposure is equal to $88 mn. 

It is equal to 400000 x max (130 — 120, 0). 

8Tt is equal to 10000 x 540 x (8% + 8%). 

°lt is equal to 8% for equities, 8% for currencies and 15% for commodities. In the case of interest rate 
risk, it corresponds to the standard value K (t) for the time band t (see the table on page 8 in BCBS 
(1996a)). 


44 Handbook of Financial Risk Management 


Example 9 We consider a portfolio of 4 options written on stocks with the following char- 
acteristics: 


Option Stock Exposure Type Price Strike Maturity Volatility 
1 A —5 call 100 110 1.00 20% 
2 A —10 call 100 100 2.00 20% 
3 B 10 call 200 210 1.00 30% 
4 B 8 put 200 190 1.25 35% 


This means that we have 2 assets. For stock A, we have a short exposure on 5 call options 
with a one-year maturity and a short exposure on 10 call options with a two-year maturity. 
For stock B, we have a long exposure on 10 call options with a one-year maturity and a 
long exposure on 8 put options with a maturity of one year and three months. 


Using the Black-Scholes model, we first compute the Greek coefficients for each option j. 
Because the options are written on single stocks, the capital charges K, c pecific geGeneral and 
Kemma are all equal to 8%. Using the previous formulas, we then deduce the individual 
capital charges for each option!?: 


j 1 2 3 4 
A; 0.45 0.69 0.56 —0.31 
T; 0.02 0.01 0.01 0.00 
vj 39.58 49.91 78.85 79.25 

pree 117.99 —55.18 89.79 —40.11 

Keneral 17,99 —55.18 89.79 —40.11 

Komma -3.17 —3.99 8.41 4.64 

Ky —9.89 —24.96 59.14 55.48 


We can now aggregate the previous individual capital charges for each stock. We obtain: 


A —73.16 —73.16 —7.16 —34.85 
B 49.69 49.69 13.05 114.61 
Total 122.85 23.47 7.16 149.46 


To compute the total capital charge, we apply the netting rule for the general market risk, 
but not for the specific risk. This means that KSPesific = |—73.16| + |49.69| = 122.85 and 
icGeneral — |_73.16 + 49.69| = 23.47. For gamma risk, we only consider negative impacts 
and we have Gereral — |_7.16| = 7.16. For vega risk, there is no netting rule: Ves = 
|—34.85| + |114.61| = 149.46. We finally deduce that the overall capital is 302.94. 


The third method is the scenario approach. In this case, we evaluate the profit and loss 
(P&L) for simultaneous changes in the underlying price and in the implied volatility of the 
option. For defining these scenarios, the ranges are the standard shifts used previously. For 
instance, we use the following ranges for equities: 


Si 
—8% +8% 


—25% 


2i 495% 


10For instance, the individual capital charge of the second option for the gamma risk is 


1 


aa 5 X (-10) x 0.0125 x (100 x 8%)? = —3.99 
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The scenario matrix corresponds to intermediate points on the 2 x 2 grid. For each cell of 
the scenario matrix, we calculate the P&L of the option exposure!!. The capital charge is 
then the largest loss. 


Securitization instruments The treatment of specific risk of securitization positions 
is revised in Basel 2.5 and is based on external ratings. For instance, the capital charge 
for securitization exposures is 1.6% if the instrument is rated from AAA to AA—. For 
resecuritization exposures, it is equal to 3.2%. If the rating of the instrument is from BB+ 
to BB-, the risk capital charges becomes respectively!” 28% and 52%. 


2.1.1.2 Internal model-based approach 


The use of an internal model is conditional upon the approval of the supervisory au- 
thority. In particular, the bank must meet certain criteria concerning different topics. These 
criteria concerns the risk management system, the specification of market risk factors, the 
properties of the internal model, the stress testing framework, the treatment of the specific 
risk and the backtesting procedure. In particular, the Basel Committee considers that the 
bank must have “sufficient numbers of staff skilled in the use of sophisticated models not 
only in the trading area but also in the risk control, audit, and if necessary, back office 
areas”. We notice that the Basel Committee first insists on the quality of the trading de- 
partment, meaning that the trader is the first level of risk management. The validation of 
an internal model does not therefore only concern the risk management department, but 
the bank as a whole. 


Qualitative criteria BCBS (1996a) defines the following qualitative criteria: 


e “The bank should have an independent risk control unit that is responsible for the 
design and implementation of the bank’s risk management system. [...] This unit 
must be independent from business trading units and should report directly to senior 
management of the bank”. 


e The risk management department produces and analyzes daily reports, is responsible 
for the backtesting procedure and conducts stress testing analysis. 


e The internal model must be used to manage the risk of the bank in the daily basis. It 
must be completed by trading limits expressed in risk exposure. 


e The bank must document internal policies, controls and procedures concerning the 
risk measurement system (including the internal model). 


It is today obvious that the risk management department should not report to the 
trading and sales department. Twenty-five years ago, it was not the case. Most of risk man- 
agement units were incorporated to business units. It has completely changed because of 
the regulation and risk management is now independent from the front office. The risk man- 
agement function has really emerged with the amendment to incorporate market risks and 
even more with the Basel II reform, whereas the finance function has long been developed 
in banks. For instance, it’s very recent that the head of risk management! is also a member 
of the executive committee of the bank whereas the head of the finance department!* has 
always been part of the top management. 


11]¢ may include the cash exposure if the option is used for hedging purposes. 
12See pages 4-7 of BCBS (2009b) for the other risk capital charges. 

13He is called the chief risk officer or CRO. 

14He is called the chief financial officer or CFO. 
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From the supervisory point of view, an internal model does not reduce to measure the 
risk. It must be integrated in the management of the risk. This is why the Basel Committee 
points out the importance between the outputs of the model (or the risk measure), the 
organization of the risk management and the impact on the business. 


Quantitative criteria The choice of the internal model is left to the bank, but it must 
respect the following quantitative criteria: 


e The value-at-risk (VaR) is computed on a daily basis with a 99% confidence level. The 
minimum holding period of the VaR is 10 trading days. If the bank computes a VaR. 
with a shorter holding period, it can use the square-root-of-time rule. 


e The risk measure can take into account diversification, that is the correlations between 
the risk categories. 


e The model must capture the relevant risk factors and the bank must pay attention to 
the specification of the appropriate set of market risk factors. 


e The sample period for calculating the value-at-risk is at least one year and the bank 
must update the data set frequently (every month at least). 


e In the case of options, the model must capture the non-linear effects with respect to 
the risk factors and the vega risk. 


e “Each bank must meet, on a daily basis, a capital requirement expressed as the higher 
of (i) its previous day’s value-at-risk number [...] and (ii) an average of the daily 
value-at-risk measures on each of the preceding sixty business days, multiplied by a 
multiplication factor”. 


e The value of the multiplication factor depends on the quality of the internal model 
with a range between 3 and 4. The quality of the internal model is related to its 
ex-post performance measured by the backtesting procedure. 


The holding period to define the capital is 10 trading days. However, it is difficult 
to compute the value-at-risk for such holding period. In practice, the bank computes the 
one-day value-at-risk and converts this number into a ten-day value-at-risk using the square- 
root-of-time rule: 


VaR. (w; ten days) = v10 x VaR, (w;one day) 


This rule comes from the scaling property of the volatility associated to a geometric Brown- 
ian motion. It has the advantage to be simple and objective, but it generally underestimates 
the risk when the loss distribution exhibits fat tails!°. 


The required capital at time t is equal to: 


60 
1 
K, = max (vires Cro, 5 vat) (2.1) 
{=l 


where VaR, is the value-at-risk calculated at time t and € is the penalty coefficient (0 < 
€ < 1). In normal periods where VaR¿—ı ~ VaR¿—;, the required capital is the average of 
the last 60 value-at-risk values times the multiplication factor! me = 3 + £. In this case, 
we have: = 

Ki = Kii + al j (VaRi-1 — VaRi-61) 


15See for instance Diebold et al. (1998), Danfelsson and Zigrand (2006) or Wang et al. (2011). 
16The complementary factor is explained on page 88. 
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FIGURE 2.1: Calculation of the required capital with the VaR 


The impact of VaR,_1 is limited because the factor (3 + £) /60 is smaller than 6.7%. The 
required capital can only be equal to the previous day’s value-at-risk if the bank faces a 
stress VaR;_1 >> VaR;_;. We also notice that a shock on the VaR vanishes after 60 trading 
days. To understand the calculation of the capital, we report an illustration in Figure 2.1. 
The solid line corresponds to the value-at-risk VaR; whereas the dashed line corresponds 
to the capital X+. We assume that € = 0 meaning that the multiplication factor is equal to 
3. When t < 120, the value-at-risk varies around a constant. The capital is then relatively 
smooth and is three times the average VaR. At time t = 120, we observe a shock on the 
value-at-risk, which lasts 20 days. Immediately, the capital increases until t < 140. Indeed, 
at this time, the capital takes into account the full period of the shocked VaR (between 
t = 120 and t = 139). The full effect of this stressed period continues until t < 180, but this 
effect becomes partial when t > 180. The impact of the shock vanishes when t = 200. We 
then observe a period of 100 days where the capital is smooth because the daily value-at- 
risk does not change a lot. A second shock on the value-at-risk occurs at time t = 300, but 
the magnitude of the shock is larger than previously. During 10 days, the required capital 
is exactly equal to the previous day’s value-at-risk. After 10 days, the bank succeeds to 
reduce the risk of its portfolio. However, the daily value-at-risk increases from t = 310 to 
t = 500. As previously, the impact of the second shock vanishes 60 days after the end of 
shock. However, the capital increases strongly at the end of the period. This is due to the 
effect of the multiplication factor Mme on the value-at-risk. 


Stress testing Stress testing is a simulation method to identify events that could have 
a great impact on the soundness of the bank. The framework consists of applying stress 
scenarios and low-probability events on the trading portfolio of the bank and to evaluate 
the maximum loss. Contrary to the value-at-risk!”, stress testing is not used to compute the 


17The 99% VaR. is considered as a risk measure in normal markets and therefore ignores stress events. 
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required capital. The underlying idea is more to identify the adverse scenarios for the bank, 
evaluate the corresponding losses, reduce eventually the too risky exposures and anticipate 
the management of such stress periods. 


Stress tests should incorporate both market and liquidity risks. The Basel Committee 
considers two types of stress tests: 


1. supervisory stress scenarios; 
2. stress scenarios developed by the bank itself. 


The supervisory stress scenarios are standardized and apply to the different banks. This 
allows the supervisors to compare the vulnerability between the different banks. The bank 
must complement them by its own scenarios in order to evaluate the vulnerability of its 
portfolio according to the characteristics of the portfolio. In particular, the bank may be 
exposed to some political risks, regional risks or market risks that are not taken into account 
by standardized scenarios. The banks must report their test results to the supervisors in a 
quarterly basis. 


Stress scenarios may be historical or hypothetical. In the case of historical scenarios, the 
bank computes the worst-case loss associated to different crisis: the Black Monday (1987), 
the European monetary system crisis (1992), the bond market sell-off (1994), the internet 
bubble (2000), the subprime mortgage crisis (2007), the liquidity crisis due to Lehman 
Brothers collapse (2008), the Euro zone crisis (2011-2012), etc. Hypothetical scenarios are 
more difficult to calibrate, because they must correspond to extreme but also plausible 
events. Moreover, the multidimensional aspect of stress scenarios is an issue. Indeed, the 
stress scenario is defined by the extreme event, but the corresponding loss is evaluated with 
respect to the shocks on market risk factors. For instance, if we consider a severe Middle East 
crisis, this event will have a direct impact on the oil price, but also indirect impacts on other 
market risk factors (equity prices, US dollar, interest rates). Whereas historical scenarios 
are objective, hypothetical scenarios are by construction subjective and their calibration 
will differ from one financial institution to another. In the case of the Middle East crisis, 
one bank may consider that the oil price could fall by 30% whereas another bank may use 
a price reduction of 50%. 


In 2009, the Basel Committee revised the market risk framework. In particular, it intro- 
duces the stressed value-at-risk measure. The stressed VaR has the same characteristics than 
the traditional VaR (99% confidence level and 10-day holing period), but the model inputs 
are “calibrated to historical data from a continuous 12-month period of significant financial 
stress relevant to the bank’s portfolio”. For instance, a typical period is the 2008 year which 
both combines the subprime mortgage crisis and the Lehman Brothers bankruptcy. This 
implies that the historical period to compute the SVaR is completely different than the 
historical period to compute the VaR (see Figure 2.2). In Basel 2.5, the capital requirement 
for stressed VaR is: 


60 
1 
SVaR _ : 
K; = max (svar Ms zp 2 svat 


where SVaR, is the stressed VaR measure computed at time t. Like the coefficient me, 
the multiplication factor m, for the stressed VaR. is also calibrated with respect to the 
backtesting outcomes, meaning that we have Mms = Mme in many cases. 


Specific risk and other risk charges In the case where the internal model does not take 
into account the specific risk, the bank must compute a specific risk charge (SRC) using 
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FIGURE 2.2: Two different periods to compute the VaR and the SVaR 


the standardized measurement method. To be validated as a value-at-risk measure with 
specific risks, the model must satisfy at least the following criteria: it captures concentrations 
(magnitude and changes in composition), it captures name-related basis and event risks and 
it considers the assessment of the liquidity risk. For instance, an internal model built with a 
general market risk factor!® does not capture specific risk. Indeed, the risk exposure of the 
portfolio is entirely determined by the beta of the portfolio with respect to the market risk 
factor. This implies that two portfolios with the same beta but with a different composition, 
concentration or liquidity have the same value-at-risk. 


Basel 2.5 established a new capital requirement “in response to the increasing amount 
of exposure in banks’ trading books to credit-risk related and often illiquid products whose 
risk is not reflected in value-at-risk” (BCBS, 2009b). The incremental risk charge (IRC) 
measures the impact of rating migrations and defaults, corresponds to a 99.9% value-at- 
risk for a one-year time horizon and concerns portfolios of credit vanilla trading (bonds 
and CDS). The IRC may be incorporated into the internal model or it may be treated 
as a surcharge from a separate calculation. Also under Basel 2.5, the Basel Committee 
introduced the comprehensive risk measure (CRM), which corresponds to a supplementary 
capital charge for credit exotic trading portfolios'®. The CRM is also a 99.9% value-at-risk 
for a one-year time horizon. For IRC and CRM, the capital charge is the maximum between 
the most recent risk measure and the average of the risk measure over 12 weeks??. We 
finally obtain the following formula to compute the capital charge for the market risk under 
Basel 2.5: 

Ki = KYR ra KR + KRO aie ie Je KM 


where KY" is given by Equation (2.1) and KC>8° is the specific risk charge. In this formula, 
CSRS and/or KIRC may be equal to zero if the modeling of these two risks is included in 
the value-at-risk internal model. 


Backtesting and the ex-post evaluation of the internal model The backtesting 
procedure is described in the document Supervisory Framework for the Use of Backtesting 
in Conjunction with the Internal Models Approach to Market Risk Capital Requirements 
published by the Basel Committee in January 1996. It consists of verifying that the internal 
model is consistent with a 99% confidence level. The idea is then to compare the outcomes 
of the risk model with realized loss values. For instance, we expect that the realized loss 
exceeds the VaR figure once every 100 observations on average. 


The backtesting is based on the one-day holding period and compares the previous day’s 
value-at-risk with the daily realized profit and loss. An exception occurs if the loss exceeds 
the value-at-risk. For a given period, we compute the number of exceptions. Depending of the 
frequency of exceptions, the supervisor determines the value of the penalty function between 


18This is the case of the capital asset pricing model (CAPM) developed by Sharpe (1964). 

19This concerns correlation trading activities on credit derivatives. 

20Contrary to the VaR and SVaR. measures, the risk measure is not scaled by a multiplication factor for 
IRC and CRM. 
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0 and 1. In the case of a sample based on 250 trading days, the Basel Committee defines three 
zones and proposes the values given in Table 2.1. The green zone corresponds to a number 
of exceptions less or equal to 4. In this case, the Basel Committee considers that there is no 
problem and the penalty coefficient € is set to 0. If the number of exceptions belongs to the 
yellow zone (between 5 and 9 exceptions), it may indicate that the confidence level of the 
internal model could be lower than 99% and implies that € is greater than zero. For instance, 
if the number of exceptions for the last 250 trading days is 6, the Basel Committee proposes 
that the penalty coefficient € is set to 0.50, meaning that the multiplication coefficient me 
is equal to 3.50. The red zone is a concern. In this case, the supervisor must investigate the 
reasons of such large number of exceptions. If the problem comes from the relevancy of the 
model, the supervisor can invalidate the internal model-based approach. 


TABLE 2.1: Value of the penalty coefficient € for a sample of 250 observations 


Number of 
Zone j E 
exceptions 
Green 0-4 0.00 
E 5 0.40 
6 0.50 
Yellow T 0.65 
8 0.75 
9 0.85 
-Red 10+ 1.00- 


The definition of the color zones comes from the statistical analysis of the exception 
frequency. We note w the portfolio, L (w) the daily loss at time t and VaRa (w; h) the 
value-at-risk calculated at time t — 1. By definition, L (w) is the opposite of the P&L 
I; (w): 

Li(w) = -H (w) 
MtM;_; — MtM; 


where MtM; is the mark-to-market of the trading portfolio at time t. By definition, we have: 
Pr {Lz (w) > VaRa (w;h)} =1-—a 


where a is the confidence level of the value-at-risk. Let e, be the random variable which is 
equal to 1 if there is an exception and 0 otherwise. e; is a Bernoulli random variable with 
parameter p: 


p = Pr{e,=1} 
= Pr{Ly(w) > VaRa (w;h)} 
= l-a 


In the case of the Basel framework, a is set to 99% meaning that we have a probability of 
1% to observe an exception every trading day. For a given period [t1, t2] of n trading days, 
the probability to observe exactly m exceptions is given by the binomial formula: 


Pr {Ne (ti; t2) = m} = (") (1 — a)” arm 
m 
where Ne (t1; t2) = ae es is the number of exceptions for the period [t;, t2]. We obtain 
this result under the assumption that the exceptions are independent across time. Ne (t1; t2) 
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is then the binomial random variable B(n;1— a). We deduce that the probability to have 
up to m exceptions is: 


Pr {Ne (t1;t2) < m} = S) 1-a} ari 


The three previous zones are then defined with respect to the statistical confidence level 
of the assumption H : a = 99%. The green zone corresponds to the 95% confidence level: 
Pr {Ne (t1;t2) < m} < 95%. In this case, the hypothesis H : a = 99% is not rejected at 
the 95% confidence level. The yellow and red zones are respectively defined by 95% < 
Pr {Ne (t1;t2) < m} < 99.99% and Pr {N, (ti; t2) < m} > 99.99%. This implies that the 
hypothesis H : a = 99% is rejected at the 99.99% confidence level if the number of exceptions 
belongs to the red zone. 


TABLE 2.2: Probability distribution (in %) of the number of exceptions (n = 250 trading 


days) 
a = 99% a = 98% 
m Pr{N.=m} Pr{N.<m} Pr{N.=m} Pr{N. <m} 
0 8.106 8.106 0.640 0.640 
1 20.469 28.575 3.268 3.908 
2 25.742 54.317 8.303 12.211 
3 21.495 79.812 14.008 26.219 
4 13.407 89.219 17.653 43.872 
5 6.663 95.882 17.725 61.597 
6 2.748 98.630 14.771 76.367 
7 0.968 99.597 10.507 86.875 
8 0.297 99.894 6.514 93.388 
9 0.081 99.975 3.574 96.963 
10 0.020 99.995 1.758 98.720 


If we apply the previous statistical analysis when n is equal to 250 trading days, we 
obtain the results given in Table 2.2. For instance, the probability to have zero exception 
is 8.106%, the probability to have one exception is 20.469%, etc. We retrieve the three 
color zones determined by the Basel Committee. The green zone corresponds to the interval 
(0, 4], the yellow zone is defined by the interval [5,9] and the red zone involves the interval 
[10,250]. We notice that the color zones can vary significantly if the confidence level of 
the value-at-risk is not equal to 99%. For instance, if it is equal to 98%, the green zone 
corresponds to less than 9 exceptions. In Figure 2.3, we have reported the color zones with 
respect to the size n of the sample. 


Example 10 Calculate the color zones when n is equal to 1000 trading days and a = 99%. 


We have Pr{N, < 14} = 91.759% and Pr {N. < 15} = 95.213%. This implies that the 
green zones ends at 14 exceptions whereas the yellow zone begins at 15 exceptions. Because 
Pr{N. < 23} = 99.989% and Pr {Ne < 24} = 99.996%, we also deduce that the red zone 
begins at 24 exceptions. 


Remark 2 The statistical approach of backtesting ignores the effects of intra-day trading. 
Indeed, we make the assumption that the portfolio remains unchanged from t—1 to t, which 
is not the case in practice. This is why the Basel Committee proposes to compute the loss 
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FIGURE 2.3: Color zones of the backtesting procedure (a = 99%) 


in two different ways. The first approach uses the official realized P&L, whereas the second 
approach consists in separating the P&L of the previous’s day portfolio and the P&L due to 
the intra-day trading activities. 


2.1.2 The Basel III framework 


The finalization of the reform for computing the market risk capital charge has taken 
considerable time. After the 2008 crisis, the market risk is revised by the Basel Committee, 
which adds new capital charges (Basel 2.5) in addition to those defined in the Basel I 
framework. In the same time, the Basel Committee published a new framework called Basel 
III, which focused on liquidity and leverage risks. In 2013, the Basel Committee launched a 
vast project called the fundamental review of the trading book (FRTB). During long time, 
the banking industry believed that these discussions were the basis of new reforms in order 
to prepare a Basel IV Accord. However, the Basel Committee argued that these changes are 
simply completing the Basel III reforms. As for the Basel I Accord, banks have the choice 
between two approaches for computing the capital charge: 


1. a standardized method (SA-TB”!); 
2. an internal model-based approach (IMA). 


Contrary to the previous framework, the SA-TB method is very important even if banks 
calculate the capital charge with the IMA method. Indeed, the bank must implement SA-TB 
in order to meet the output floor requirement?’, which is set at 72.5% in January 2027. 


21TB means trading book. 
22The mechanism of capital floor is explained on page 22. 
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2.1.2.1 Standardized approach 


The standardized capital charge is the sum of three components: sensitivity-based 
method capital, the default risk capital (DRC) and the residual risk add-on (RRAO). The 
first component must be viewed as the pure market risk and is the equivalent of the capital 
charge for the general market risk in the Basel I Accord. The second component captures the 
jump-to-default risk (JTD) and replaces the specific risk that we find in the Basel I frame- 
work. The last component captures specific risks that are difficult to measure in practice. 


Sensitivity-based capital requirement This method consists in calculating a capital 
charge for delta, vega and curvature risks, and then aggregating the three capital require- 
ments: 

K= Kc Pelta zit k Vesa ie KCCurvature 


Seven risk classes are defined by the Basel Committee: (1) general interest rate risk (GIRR), 
(2) credit spread risk (CSR) on non-securitization products, (3) CSR on non-correlation 
trading portfolio (non-CTP), (4) CSR on correlation trading portfolio (CTP), (5) equity 
risk, (6) commodity risk and (7) foreign exchange risk. The sensitivities of the different 
instruments of one risk class are risk-weighted and then aggregated. The first level of ag- 
gregation concerns the risk buckets, defined as risk factors with common characteristics. 
For example, the bucket #1 for credit spread risk corresponds to all instruments that are 
exposed to the IG sovereign credit spread. The second level of aggregation is done by con- 
sidering the different buckets that compose the risk class. For example, the credit spread 
risk is composed of 18 risk buckets (8 investment grade buckets, 7 high yield buckets, 2 
index buckets and one other sector bucket). 


For delta and vega components, we first begin to calculate the weighted sensitivity of 
each risk factor F;: 
WS; = S;- RW; 


where Sj and RW; are the net sensitivity of the portfolio with respect to the risk factor 
and the risk weight of Fj. More precisely, we have Sj = 5°, 5;,; where S; j is the sensitivity 
of the instrument 7 with respect to Fj. Second, we calculate the capital requirement for the 
risk bucket B;: 


Kp, = |max | XO WSF + XC pzy WS; WSy, 0 
J VAI 


where F; € Bp. We recognize the formula of a standard deviation?’. Finally, we aggregate 
the different buckets for a given risk class”*: 


KcDelta/Vega — Jè Kz, as 5 Yk,k' WSg, WSz,, 
k k/#k 


where WSz, =). jeg, WS; is the weighted sensitivity of the bucket By. Again, we recognize 
the formula of a standard deviation. Therefore, the capital requirement for delta and vega 
risks can be viewed as a Gaussian risk measure with the following parameters: 


1. the sensitivities S; of the risk factors that are calculated by the bank; 
2. the risk weights RW, of the risk factors; 


?3The variance is floored at zero, because the correlation matrix formed by the cross-correlations Pj,j' İS 
not necessarily positive definite. 
24Tf the term under the square root is negative, the Basel Committee proposes an alternative formula. 
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3. the correlation pj j between risk factors within a bucket; 
4. the correlation yz, between the risk buckets. 


For the curvature risk, the methodology is different because it is based on two adverse 
scenarios. We note P; (F;) the price of the instrument i when the current level of the risk 
factor is Fj. We calculate Pt (Fj) = P; (F; + AFF) and P7 (F;) = P; (F; — AF7) the 
price of instrument i when the risk factor is shocked upward by AF} and downward by 


AF; . The curvature risk capital requirement for the risk factor F; is equal to: 


CVRF =-)> (ee (Fi) = PAG) > Sag Rw) 


where S; j is the delta sensitivity?’ of instrument i with respect to the risk factor F; and 
Rw; is the curvature risk weight of F;. CVR} and CVR} play the role of WS; in the 
delta/vega capital computation. The capital requirement for the bucket (or risk class) By 
is: 


K, = | max | > (max (CVRF,0))” + X pyr (CVR#,CVRE) ,0 
J VAI 

where Y (CVR,;,CVR,;-) is equal to 0 if the two arguments are both negative or is equal 

to CVR; x CVR; otherwise. Then, the capital requirement for the risk bucket Bẹ is the 

maximum of the two adverse scenarios: 


Kg, = max (Cg, Kz, ) 


At this stage, one scenario is selected: the upward scenario if K5, > Kg, or the downward 
scenario if K5, < Kg, And we define the curvature risk CVRg, for each bucket as follows: 


CVRs, = {Kg > Kg} 5 CVR} + 
jEBk 
1{Ks, < Kg} 2, CVR; 
JCB, 


Finally, the capital requirement for the curvature risk is equal to: 


jc Curvature = |max 5 Kz. 4 5 Yk kY (CVRg,, CVRg, ) ,0 
k ki ¢k 


We conclude that we use the same methodology for delta, vega and curvature risks with 
three main differences: the computation of the sensitivities, the scale of risk weights, and 
the use of two scenarios for the curvature risk. 

The first step consists in defining the risk factors. The Basel Committee gives a very 
precise list of risk factors by asset classes (BCBS, 2019). For instance, the equity delta risk 
factors are the equity spot prices and the equity repo rates, the equity vega risk factors 


25For FX and equity risk classes, Si,j is the delta sensitivity of instrument 7. For the other risk classes, 
Si, j is the sum of delta sensitivities of instrument 7 with respect to the risk factor F;. 
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are the implied volatilities of options, and the equity curvature risk factors are the equity 
spot prices. We retrieve the notions of delta, vega and gamma that we encounter in the 
theory of options. In the case of the interest rate risk class (GIRR), the risk factors include 
the yield curve”®, a flat curve of market-implied inflation rates for each currency and some 
cross-currency basis risks. For the other categories, the delta risk factors are credit spread 
curves, commodity spot prices and exchange rates. As for equities, vega and curvature risk 
factors correspond to implied volatilities of options and aggregated delta risk factors. 


The second step consists in calculating the sensitivities. The equity delta sensitivity of 
the instrument 7 with respect to the equity risk factor F; is given by: 


Sig = Ai (F3) Fj 


where A; (F;) measures the (discrete) delta?’ of the instrument i by shocking the equity risk 
factor F; by 1%. If the instrument i corresponds to a stock, the sensitivity is exactly the price 
of this stock when the risk factor is the stock price, and zero otherwise. If the instrument i 
corresponds to an European option on this stock, the sensitivity is the traditional delta of 
the option times the stock price. The previous formula is also valid for FX and commodity 
risks. For interest rate and credit risks, the delta corresponds to the PVO1, that is a change 
of the interest rate and credit spread by 1 bp. For the vega sensitivity, we have: 


Sij = Ui (F;) , Fi 


where F; is the implied volatility. 

The third step consists in calculating the risk-weighted sensitivities WS,. For that, we 
use the tables given in BCBS (2019). For example, the risk weight for the 3M interest rate 
is equal to 1.7% while the risk weight for the 30Y interest rate is equal to 1.1% (BCBS, 
2019, Table 1, page 38). For equity spot prices, the risk weight goes from 15% for large 
cap DM indices to 70% for small cap EM stocks (BCBS, 2019, Table 10, page 47). The 
fourth step computes the capital charge for each bucket. In this case, we need the ‘factor’ 
correlations pj j between the risk factors within the same bucket. For example, the yield 
curve correlations between the 10 tenors of the same currency are given in Table 2 on page 38 
in BCBS (2019). For the equity risk, p; j goes from 7.5% to 80%. Finally, we can compute the 
capital by considering the ‘bucket’ correlations. For example, Yp,’ is set to 50% between the 
different currencies in the case of the interest rate risk. We must note that the values given by 
the Basel Committee correspond to a medium correlation scenario. The Basel Committee 
observes that correlations may increase or decrease in period of a stressed market, and 
impose that the bank must use the maximum of capital requirement under three correlation 
scenarios: medium, high and low. Under the high correlation scenario, the correlations are 


increased: ei = min (1.25 x p;,;7,1) and ee = min (1.25 x Yk,k', 1). Under the low 


J 
correlation scenario, the correlations are decreased: p¥° = max (2 x pj j — 1,0.75 x pj,j’) 


Id 
and EN = max (2 x Yk,k' — 1,0.75 X 7,4’). Figure 2.4 shows how the medium correlation 


is scaled to high and low correlation scenarios. 


26The risk factors correspond to the following tenors of the yield curve: 3M, 6M, 1Y, 2Y, 3Y, 5Y, 10Y, 
15Y, 20Y and 30Y. 
27Tt follows that: 


P; (1.01 - Fj) — P; (F. 
See ( i) — Pil D.F, 
1.01-F; —F; 
P; (1.01 - F;) — P; (F;) 


0.01 
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FIGURE 2.4: High, medium and low correlation scenarios 


Default risk capital The gross jump-to-default (JTD) risk is computed by differentiating 


long and short exposures”®: 


JTD”: = max (N - LGD +I], 0) 


and: 


JTDSht — min (N - LGD HI, 0) 


where N is the notional, LGD is the loss given default”? and II is the current P&L. Then, 
we offset long and short exposures to the same obligor under some conditions of seniority 
and maturity. At this stage, we obtain net JTD exposures, that can be positive (long) or 
negative (short). Three buckets are defined: (1) corporates, (2) sovereigns and (3) local 
governments and municipalities. For each bucket Bk, the capital charge is calculated as 
follows: 


KRO = max | JO RW: JTD” -HBR > RW;-|JTDS* 
ic Long 4€Short 


0 (2.2) 


where the risk weight depends on the rating of the obligor: 


Rating AAA AA A BBB BB B CCC NR 
RW 0.5% 2% 3% 6% 15% 30% 50% 15% 


28A long exposure implies that the default results in a loss, whereas a short exposure implies that the 
default results in a gain. 

29 The default values are 100% for equity and non-senior debt instruments, 75% for senior debt instruments, 
25% for covered bonds and 0% for FX instruments. 
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and HBR is the hedge benefit ratio: 


Net 
J ic£ong JTD; i 
Net 
Dsieclong JTD; ° + Dee Shiort 


HBR = 


AINE rag 


At first sight, Equation (2.2) seems to be complicated. In order to better understand this 
formula, we assume that there is no short credit exposure and the P&L of each instrument 
is equal to zero. Therefore, the capital charge for the bucket Bẹ is equal to: 


DRC _ 
KBPS = X Ni- LGD; - RW; 
*€Be BAD; 


We recognize the formula for computing the credit risk capital when we replace the exposure 
at default by the product of the notional and the loss given default. In the case of a portfolio 
of loans, the exposures are always positive. In the case of a trading portfolio, we face 
more complex situations because we can have both long and short credit exposures. The 
introduction of the hedge benefit ratio allows to mitigate the risk of long credit exposures. 


Remark 3 The previous framework is valid for non-securitization instruments. For secu- 
ritization, a similar approach is followed, but the LGD factor disappears in order to avoid 
double counting. Moreover, the treatment of offsetting differs for non-CTP and CTP prod- 
ucts. 


Residual risk add-on The idea of this capital charge is to capture market risks which are 
not taken into account by the two previous methods. Residual risks concerns instruments 
with an exotic underlying (weather, natural disasters, longevity, etc.), payoffs that are not 
a linear combination of vanilla options (spread options, basket options, best-of, worst-of, 
etc.), or products that present significant gap, correlation or behavioral risks (digital options, 
barrier options, embedded options, etc.). We have: 


KRAO = N; RW; 


where RW; is equal to 1% for instruments with an exotic underlying and 10 bps for the 
other residual risks. 


2.1.2.2 Internal model-based approach 


As in the first Basel Accord, the Basel III framework includes general criteria, qualitative 
standards, quantitative criteria, backtesting procedures and stress testing approaches. The 
main difference concerning general criteria is the introduction of trading desks. According 
to BCBS (2019), a trading desk is “an unambiguously defined group of traders or trading 
accounts that implements a well-defined business strategy operating within a clear risk 
management structure”. Internal models are implemented at the trading desk level. Within 
a bank, some trading desks are then approved for the use of internal models, while other 
trading desks must use the SA-TB approach. The Basel Committee reinforces the role of the 
model validation unit, the process of the market risk measurement system (documentation, 
annual independent review, etc.) and the use of stress scenarios. 


Capital requirement for modellable risk factors Concerning capital requirements, 
the value-at-risk at the 99% confidence level is replaced by the expected shortfall at the 
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TABLE 2.3: Liquidity horizon (Basel III) 


Liquidity class k Liquidity horizon hk 


1 10 
2 20 
3 40 
4 60 
5 120 


97.5% confidence level. Moreover, the 10-day holding period is not valid for all instruments. 
Indeed, the expected shortfall must take into account the liquidity risk and we have: 


5 


ES, (w) = D (Bsa tm noe) 


k=1 
where: 


e ES. (w; hı) is the expected shortfall of the portfolio w at horizon 10 days by consid- 
ering all risk factors; 


e ES. (w; hg) is the expected shortfall of the portfolio w at horizon hy, days by consid- 
ering the risk factors F; that belongs to the liquidity class k; 


e hy is the horizon of the liquidity class k, which is given in Table 2.3 (ho is set to zero). 


This expected shortfall framework is valid for modellable risk factors. Within this frame- 
work, all instruments are classified into 5 buckets (10, 20, 40, 60 and 120 days), which are 
defined by BCBS (2019) as follows: 


1. Interest rates (specified currencies? and domestic currency of the bank), equity prices 
(large caps), FX rates (specified currency pairs**). 


2. Interest rates (unspecified currencies), equity prices (small caps) and volatilities (large 
caps), FX rates (currency pairs), credit spreads (IG sovereigns), commodity prices 
(energy, carbon emissions, precious metals, non-ferrous metals). 


3. FX rates (other types), FX volatilities, credit spreads (IG corporates and HY 
sovereigns). 


4. Interest rates (other types), IR volatility, equity prices (other types) and volatilities 
(small caps), credit spreads (HY corporates), commodity prices (other types) and 
volatilities (energy, carbon emissions, precious metals, non-ferrous metals). 


5. Credit spreads (other types) and credit spread volatilities, commodity volatilities and 
prices (other types). 


The expected shortfall must reflect the risk measure for a period of stress. For that, the 
Basel Committee proposes an indirect approach: 


FG (full,current) zh 
ES, (w; h) = igo tured sit) (w; h) smit ( a (w; ) 


ig odteed current) (w; h) 


30The specified currencies are composed of EUR, USD, GBP, AUD, JPY, SEK and CAD. 
31They correspond to the 20 most liquid currencies: USD, EUR, JPY, GBP, AUD, CAD, CHF, MXN, 
CNY, NZD, RUB, HKD, SGD, TRY, KRW, SEK, ZAR, INR, NOK and BRL. 
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where ESY%vLeurrent) is the expected shortfall based on the current period with the full set 
of risk factors, feces canon is the expected shortfall based on the current period with 
a restricted set of risk factors and ESfeducet stress) is the expected shortfall based on the 
stress period’? with the restricted set of risk factors. The Basel Committee recognizes that 
it is difficult to calculate directly ES¢™ ses) (w: A) on the stress period with the full set of 
risk factors. Therefore, the previous formula assumes that there is a proportionality factor 
between the full set and the restricted set of risk factors**: 


Higa stress) (w; h) 7 pglrediced;stress) (w; h) 
po Ealenrrent) (w; h) po diced ouent) (w; h) 


Example 11 In the table below, we have calculated the 10-day expected shortfall for a given 
portfolio: 


Set o : Liquidity class 
risk A P 1 - 3 4 5 
Full Current | 100 75 34 12 6 
Reduced Current | 88 63 30 7 5 
Reduced Stress | 112 83 47 9 7 


As expected, the expected shortfall decreases with the liquidity horizon, because there are less 
and less risk factors that belong to the liquidity class. We also verify that the ES for the 
reduced set of risk factors is lower than the ES for the full set of risk factors. 


TABLE 2.4: Scaled expected shortfall 


k Sex Full Reduced Reduced Full/Stress Full 
Current Current Stress (not scaled) Stress 
1 1 100.00 88.00 112.00 127.27 127.27 
2 1 75.00 63.00 83.00 98.81 98.81 
3 V2 48.08 42.43 66.47 53.27 75.33 
4 y2 16.97 9.90 12.73 15.43 21.82 
5 V6 14.70 12.25 17.15 8.40 20.58 
Total 135.80 117.31 155.91 180.38 


Results are given in Table 2.4. For each liquidity class k, we have reported the scaling 
factor Sck = \/ (hg — hp—1) /h1, the scaled expected shortfall ES% (w; hk) = Scp-ESa (w; he) 
(columns 3, 4 and 5) and the total expected shortfall ES, (w) = pS (ES* (w; hy). It 
is respectively equal to 135.80, 117.31 and 155.91 for the full/current, reduced/current and 
reduced/stress case. Since the proportionality factor is equal to 135.80/117.31 = 1.1576, 
we deduce that the ES for the full set of risk factors and the stress period is equal to 
1.1576 x 155.91 = 180.48. Another way to calculate the ES is first to compute the ES for 
the full set of risk factors and the stress period for each liquidity class k and deduce the 
scaled expected shortfall (columns 6 and 7). In this case, the ES for the full set of risk 
factors and the stress period is equal to 180.38. 


32The bank must consider the most severe 12-month period of stress available. 
33 However, the Basel Committee indicates that the reduced set of risk factors must explain al leat 75% 
of the risk in periods of stress. 
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The final step for computing the capital requirement (also known as the ‘internally 
modelled capital charge’) is to apply this formula: 


5 
IMCC = @- IMCCogjtobai + (1 — @) - 5 IMCC, 
k=1 
where o is equal to 50%, IMCC robai is the stressed ES calculated with the internal model 
and cross-correlations between risk classes, IMCC, is the stressed ES calculated at the risk 
class level (interest rate, equity, foreign exchange, commodity and credit spread). IMCC is 
then an average of two capital charges: one that takes into account cross-correlations and 
another one that ignores diversification effects. 


Capital requirement for non-modellable risk factors Concerning non-modellable 
risk factors, the capital requirement is based on stress scenarios, that are equivalent to a 
stressed expected shortfall. The Basel Committee distinguish three types of non-modellable 
risk factors: 


1. Non-modellable idiosyncratic credit spread risk factors (i = 1,..., mc); 
2. Non-modellable idiosyncratic equity risk factors (j = 1,..., Me); 
3. Remaining non-modellable risk factors (k =1,..., mo). 


The capital requirement for non-modellable risk factors is then equal to: 


SES = oRgCredit + SESEauity 4 OFS Other 


where SES“Tt# — 4/7" SES?, SES" = , /S0%", SES? and: 


ma 2 Mo 
SRG Other = 0? ; bs sess] + (1 = 07) . 5 SES? 
fay k=1 


For non-modellable credit or equity risks, we assume a zero correlation. For the remaining 
non-modellable risks, the correlation ọ is set to 60%. An important issue for computing 
SES is the liquidity horizon. The Basel Committee imposes to consider the same values 
used for modellable risk factors, with a floor of 20 days. For idiosyncratic credit spreads, 
the liquidity horizon is set to 120 days. 


Capital requirement for default risk The default risk capital (DRC) is calculated 
using a value-at-risk model with a 99.9% confidence level. The computation must be done 
using the same default probabilities that are used for the IRB approach. This implies that 
default risk is calculated under the historical probability measure, and not under the risk- 
neutral probability measure. This is why market-implied default probabilities are prohibited. 


Capital requirement for the market risk For eligible trading desks that are approved 
to use the IMA approach, the capital requirement for market risk is equal to: 


me X2 IMCC i +5, msa) P 


60 
DRC (2.3) 


where Mme = 1.5 + é and 0 < £ < 0.5. This formula is similar to the one defined in the Basel 
I Accord. We notice that the magnitude of the multiplication factor me has changed since 
we have 1.5 < me < 2. 


KMA — max (mice. sses 
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TABLE 2.5: Value of the penalty coefficient € in Basel III 


Number of 


Zone j E 
exceptions 
Green 0-4 0.00 
5 020 
6 0.26 
Amber T 0.33 
8 0.38 
9 0.42 
Red 10+ 0.50- 


Backtesting The backtesting procedure continues to be based on the daily VaR with 
a 99% confidence level and a sample of the last 250 observations. Table 2.5 presents the 
definition of the color zones. We notice that the amber zone replaces the yellow zone, 
and the values of the penalty coefficient € have changed. The value of the multiplier me = 
1.5+€ depends then on the one-year backtesting procedure at the bank-wide level. However, 
the bank must also conduct backtesting exercises for each eligible trading desk because of 
two reasons. First, the P&L attribution (PLA) is one of the pillars for the approval of 
trading desks by supervisory authorities. It is highly reinforced with several PLA tests, 
that distinguish actual P&L (including intra-day trading activities) and hypothetical P&L 
(static portfolio). Second, if one eligible trading desk is located in the amber zone, the 
formula (2.3) is modified in order to take into account a capital surcharge. Moreover, if one 
eligible trading desk has more than 12 exceptions**, the bank must use the SA-TB approach 
for calculating the capital charge of this trading desk. 


2.2 Statistical estimation methods of risk measures 


We have seen that Basel I is based on the value-at-risk while Basel III uses the expected 
shortfall for computing the capital requirement for market risk. In this section, we define 
precisely what a risk measure is and we analyze the value-at-risk and the expected shortfall, 
which are the two regulatory risk measures. In particular, we present the three statistical 
approaches (historical, analytical and Monte Carlo) that are available. The last part of this 
section is dedicated to options and exotic products. 


2.2.1 Definition 
2.2.1.1 Coherent risk measures 


Let R (w) be the risk measure of portfolio w. In this section, we define the different 
properties that should satisfy the risk measure R (w) in order to be acceptable in terms of 
capital allocation. Following Artzner et al. (1999), R is said to be ‘coherent’ if it satisfies 
the following properties: 


34The Basel Committee adds a second inclusive condition: the trading desk must have less than 30 
exceptions at the 97.5% confidence level. This remark shows that the bank must in fact conduct two 
backtesting procedures at the trading desk level: one based at the 99% confidence level and another one 
based at the 97.5% confidence level. 
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. Subadditivity 


R(w, + w2) < R (w1) + R (w2) 


The risk of two portfolios should be less than adding the risk of the two separate 
portfolios. 


. Homogeneity 


R(Aw) =AR(w) ifrA>0 


Leveraging or deleveraging of the portfolio increases or decreases the risk measure in 
the same magnitude. 


. Monotonicity 


if wy < we, then R (w1) > R (we) 


If portfolio wz has a better return than portfolio wı under all scenarios, risk measure 
R (w1) should be higher than risk measure R (w2). 


. Translation invariance 


ifm E€ R, then R(w+m) =R(w)-—m 


Adding a cash position of amount m to the portfolio reduces the risk by m. This 
implies that we can hedge the risk of the portfolio by considering a capital that is 
equal to the risk measure: 


R(w+R(w)) =R(w) -—R(w) =0 


The definition of coherent risk measures led to a considerable interest in the quantitative 
risk management. Thus, Föllmer and Schied (2002) propose to replace the homogeneity and 
subadditivity conditions by a weaker condition called the convexity property: 


R (Aw + (1 — A) w2) < AR (wi) + (1 — A)R (wa) 


This condition means that diversification should not increase the risk. 


We can write the loss of a portfolio as L(w) = —P;(w) Ri+n (w) where P;(w) and 


Ri+n(w) are the current value and the future return of the portfolio. Without loss of 
generality®’, we assume that P, (w) is equal to 1. In this case, the expected loss E [L (w)] 
is the opposite of the expected return u (w) of the portfolio and the standard deviation 
o (L(w)) is equal to the portfolio volatility ø (w). We consider then different risk measures: 


e Volatility of the loss 


R (w) = o (L(w)) = o (w) 
The volatility of the loss is the standard deviation of the portfolio loss. 


e Standard deviation-based risk measure 


R (w) = SD, (w) = E[L(w)] + c- o (L (w)) = -u (w) + c- o (w) 


To obtain this measure, we scale the volatility by factor c > 0 and subtract the 
expected return of the portfolio. 


35The homogeneity property implies that: 


a (nea) = o 


We can therefore calculate the risk measure using the absolute loss (expressed in $) or the relative loss 
(expressed in %). The two approaches are perfectly equivalent. 
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e Value-at-risk 
R (w) = VaR, (w) = inf {2 : Pr {L (w) < } > a} 


The value-at-risk is the a-quantile of the loss distribution F and we note it F~+ (a). 


e Expected shortfall 
1 1 
R (w) = ES, (w) = —/ VaR, (w) du 


The expected shortfall is the average of the VaRs at level a and higher (Acerbi and 
Tasche, 2002). We note that it is also equal to the expected loss given that the loss is 
beyond the value-at-risk: 


ES, (w) = E[L(w) | L (w) > VaRg (w)] 


By definition, the expected shortfall is greater or equal than the value-at-risk for a 
given confidence level. 


We can show that the standard deviation-based risk measure and the expected shortfall 
satisfy the previous coherency and convexity conditions. For the value-at-risk, the subaddi- 
tivity property does not hold in general. This is a problem because the portfolio risk may 
have be meaningful in this case. More curiously, the volatility is not a coherent risk measure 
because it does not verify the translation invariance axiom. 


Example 12 We consider a $100 defaultable zero-coupon bond, whose default probability 
is equal to 200 bps. We assume that the recovery rate R is a binary random variable with 
Pr{R = 0.25} = Pr{R = 0.75} = 50%. 


Below, we have represented the probability tree diagram of the loss L of the zero- 
coupon bond. We deduce that F (0) = Pr{L < 0} = 98%, F (25) = Pr{L; < 25} = 99% 
and F (75) = Pr {L; < 75} = 100%. 


It follows that the 99% value-at-risk is equal to $25, and we have: 


ES% (L) = E[L| ZL > 25] 
25 + 75 
2 
= $50 
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We assume now that the portfolio contains two zero-coupon bonds, whose default times are 
independent. The probability density function of (L1, L2) is given below: 


Lı=0 4=25 L,=75 
L2=0 | 96.04% 0.98% 0.98% | 98.00% 
La =25 | 0.98% 0.01% 0.01% 1.00% 
La =75 | 0.98% 0.01% 0.01% 1.00% 
98.00% 1.00% 1.00% 


We deduce that the probability distribution function of L = Lı + Lo is: 


{ 0 25 50 75 100 150 
Pr{L =£} 96.04% 1.96% 0.01% 1.96% 0.02% 0.01% 
Pr{L <£} 96.04% 98% 98.01% 99.97% 99.99% 100% 


It follows that VaRggy (L) = 75 and: 


75 x 1.96% + 100 x 0.02% + 150 « 0.01% 
1.96% + 0.02% + 0.01% 
$75.63 


ESo9% (L) 


For this example, the value-at-risk does not satisfy the subadditivity property, which is not 
the case of the expected shortfall°°. 


For this reason, the value-at-risk has been frequently criticized by academics. They 
also pointed out that it does not capture the tail risk of the portfolio. This led the Basel 
Committee to replace the 99% value-at-risk by the 97.5% expected shortfall for the internal 
model-based approach in Basel III (BCBS, 2019). 


2.2.1.2 Value-at-risk 


The value-at-risk VaRa (w; h) is defined as the potential loss which the portfolio w can 
suffer for a given confidence level a and a fixed holding period h. Three parameters are 
necessary to compute this risk measure: 


e the holding period h, which indicates the time period to calculate the loss; 


e the confidence level a, which gives the probability that the loss is lower than the 
value-at-risk; 


e the portfolio w, which gives the allocation in terms of risky assets and is related to 
the risk factors. 


Without the first two parameters, it is not possible to interpret the amount of the value- 
at-risk, which is expressed in monetary units. For instance, a portfolio with a VaR. of $100 
mn may be regarded as highly risky if the VaR corresponds to a 90% confidence level and a 
one-day holding period, but it may be a low risk investment if the confidence level is 99.9% 
and the holding period is one year. 


We note P;(w) the mark-to-market value of the portfolio w at time t. The profit and 
loss between t and t+ h is equal to: 


Tl (w) = Pepn (w) — P; (w) 


36We have VaRggy (L1) + VaRgg% (L2) = 50, VaRgg% (Lı + L2) > VaRogy (Lı) + VaRgg% (L2), 
ESg9% (L1) + ES99% (L2) = 100 and ESg9% (Lı + L2) < ES99% (L1) + ES99% (L2). 
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We define the loss of the portfolio as the opposite of the P&L: L(w) = —II(w). At time 
t, the loss is not known and is therefore random. From a statistical point of view, the 
value-at-risk VaRa (w; h) is the quantile?” of the loss for the probability a: 


Pr {ZL (w) < VaRa (w;h)} =a 


This means that the probability that the random loss is lower than the VaR is exactly equal 
to the confidence level. We finally obtain: 


VaRa (w; h) = Fz" (a) 


where Fz is the distribution function of the loss*®. 


We notice that the previous analysis assumes that the portfolio remains unchanged be- 
tween t and t+h. In practice, it is not the case because of trading and rebalancing activities. 
The holding period h depends then on the nature of the portfolio. The Basel Committee 
has set h to one trading day for performing the backtesting procedure in order to minimize 
rebalancing impacts. However, h is equal to 10 trading days for capital requirements in Basel 
I. It is the period which is considered necessary to ensure the rebalancing of the portfolio if 
it is too risky or if it costs too much regulatory capital. The confidence level a is equal to 
99% meaning that there is an exception every 100 trading days. It is obvious that it does 
not correspond to an extreme risk measure. From the point of view of regulators, the 99% 
value-at-risk gives then a measure of the market risk in the case of normal conditions. 


2.2.1.3 Expected shortfall 


The expected shortfall ES, (w; h) is defined as the expected loss beyond the value-at-risk 
of the portfolio: 


ES, (w;h) = E [L (w) | L (w) > VaRa (w; h)] 


Therefore, it depends on the three parameters (h, a and w) of the VaR. Since we have 
ES. (w;h) > VaRa (w;h), the expected shortfall is considered as a risk measure under 
more extreme conditions than the value-at-risk. By construction, we also have: 


ai > ag > ESa, (w;h) > VaRa, (w; h) 


However, it is impossible de compare the expected shortfall and the value-at-risk when the 
ES confidence level is lower than the VaR confidence level (a1 < ag). This is why it is 
difficult to compare the ES in Basel III (a = 97.5%) and the VaR in Basel I (a = 99%). 


2.2.1.4 Estimator or estimate? 


To calculate the value-at-risk or the expected shortfall, we first have to identify the risk 
factors that affect the future value of the portfolio. Their number can be large or small 
depending on the market, but also on the portfolio. For instance, in the case of an equity 
portfolio, we can use the one-factor model (CAPM), a multi-factor model (industry risk 
factors, Fama-French risk factors, etc. ) or we can have a risk factor for each individual 
stock. For interest rate products, the Basel Committee imposes that the bank uses at least 


37If the distribution of the loss is not continuous, the statistical definition of the quantile function is: 


VaR oq (w;h) = inf {x : Pr{L(w) < z} > a} 


38Tn a similar way, we have Pr {II (w) > — VaRo (w;h)} = a and VaR (w;h) = -F;' (1 — a) where Fy 
is the distribution function of the P&L. 
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six factors to model the yield curve risk in Basel I and ten factors in Basel III. This contrasts 
with currency and commodity portfolios where we must take into account one risk factor by 
exchange rate and by currency. Let (F1, ..., Fm) be the vector of risk factors. We assume 
that there is a pricing function g such that: 


P, (w) = g (Fi t,- -, Fmt) w) 


We deduce that the expression of the random loss is the difference between the current 
value and the future value of the portfolio: 


L(w) = P,(w)-—g(Fit+h,..-, Fm,t+h; wW) 


= £ (Fittn; ees Fite w) 


where @ is the loss function. The big issue is then to model the future values of risk factors. 
In practice, the distribution Fz, is not known because the multidimensional distribution 
of the risk factors is not known. This is why we have to estimate Fz meaning that the 
calculated VaR and ES are also two estimated values: 


VaR (w; h) = ÊF! (a) = FZ! (1 — a) 


and: 


1 
Poue f ET 
l-a ja 
Therefore, we have to make the difference between the estimator and the estimate. Indeed, 
the calculated value-at-risk or expected shortfall is an estimate, meaning that it is a real- 
ization of the corresponding estimator. In practice, there are three approaches to calculate 
the risk measure depending on the method used to estimate Fr: 


1. the historical value-at-risk/expected shortfall, which is also called the empirical or 
non-parametric VaR/ES; 


2. the analytical (or parametric) value-at-risk /expected shortfall; 
3. the Monte Carlo (or simulated) value-at-risk/expected shortfall. 


The historical approach is the most widely used method by banks for computing the capital 
charge. This is an unbiased estimator, but with a large variance. On the contrary, the 
analytical estimator is biased, because it assumes a parametric function for the risk factors, 
but it has a lower variance than the historical estimator. Finally, the Monte Carlo estimator 
can produce an unbiased estimator with a small variance. However, it could be difficult to 
put in place because it requires large computational times. 


Remark 4 In this book, we use the statistical expressions VaRa (w;h) and ES, (w;h) in 
place of VaRa (w;h) and ES, (w;h) in order to reduce the amount of notation. 


2.2.2 Historical methods 


The historical VaR corresponds to a non-parametric estimate of the value-at-risk. For 
that, we consider the empirical distribution of the risk factors observed in the past. Let 
(Fi,s,--+,Fm,s) be the vector of risk factors observed at time s < t. If we calculate the 
future P&L with this historical scenario, we obtain: 


Il, (w) = g (Fi s,- , Fm,s; wW) — P, (w) 
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If we consider ng historical scenarios (s = 1,...,ng), the empirical distribution Fy is 
described by the following probability distribution: 
T(w) | m (w) H(w) +++ Uns (w) 
Ps | Ung Uns Ung 


because each probability of occurrence is the same for all the historical scenarios. To calcu- 
late the empirical quantile ie (a), we can use two approaches: the order statistic approach 
and the kernel density approach. 


2.2.2.1 The order statistic approach 


Let X1,..., Xn be a sample from a continuous distribution F. Suppose that for a given 
scalar a € ]0, 1[, there exists a sequence {an} such that yn (a, — na) — 0. Lehmann (1999) 
shows that: a ) 

a(l-a 
Vn (X(a,:n) — F7* (a) +N (0 aey) (2.4) 
( (an:n) ) f? (F-!(a)) 

This result implies that we can estimate the quantile F7! (a) by the mean of the nat 
order statistic. Let us apply the previous result to our problem. We calculate the order 
statistics associated to the P&L sample {II; (w),...,Hn, (w)}: 


h 


min IT, (w) = Tens) < Tl2:ng) Zee Uirng=ims) < Ting:ng) = max II, (w) 


The value-at-risk with a confidence level a is then equal to the opposite of the ng (1 — a)" 


order statistic of the P&L: 
VaRa (w; h) = —Ing(1—a):ns) (2.5) 
If ng (1 — a) is not an integer, we consider the interpolation scheme: 
VaRa (w; h) = — (Wrens) + (ng (1 — a) — q) (Wietims) = Tyas) 


where q = da (ng) = |ns (1 — a)| is the integer part of ng (1 — a). For instance, if ng = 100, 
the 99% value-at-risk corresponds to the largest loss. In the case where we use 250 historical 
scenarios, the 99% value-at-risk is the mean between the second and third largest losses: 


VaRa(w;h) = — (Ia.250) + (2-5 — 2) (IIs:250) — H2:250))) 
1 
-3 (II(2:250) + H(8:250)) 


1 
= 3 (L,249:250) + L(248:250)) 


Remark 5 We reiterate that VaRa (w;h) defined by Equation (2.5) is an estimator with 
an asymptotic variance given by Equation (2.4). Suppose that the loss of the portfolio is 
Gaussian and L(w) ~N (0,1). The exact value-at-risk is ®~1 (a) and takes the values 1.28 
or 2.33 if a is equal to 90% or 99%. The standard deviation of the estimator depends on 
the number ng of historical scenarios: 


a(1—a) 
a (VaR q (w; h)) = —>-—— 
Waa (i) © 5 (© (a)) 
In Figure 2.5, we have reported the density function of the VaR estimator. We notice that 
the estimation error decreases with ng. Moreover, it is lower for a = 90% than for a = 99%, 
because the density of the Gaussian distribution at the point x = 1.28 is larger than at the 
point x = 2.33. 
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FIGURE 2.5: Density of the VaR estimator (Gaussian case) 


Example 13 We consider a portfolio composed of 10 stocks Apple and 20 stocks Coca-Cola. 
The current date is 2 January 2015. 


The mark-to-market of the portfolio is: 
P, (w) = 10 x Py, +20 x Poy 


where P; and P24 are the stock prices of Apple and Coca-Cola. We assume that the market 
risk factors corresponds to the daily stock returns Rı,„ and R24. We deduce that the P&L 
for the scenario s is equal to: 


II, (w) = 10 x Pi s + 20 x Po, — Pr (w) 
ee 
g(R1,s,Re,s;w) 


where P; s = Piz x (1+ Ri,s) is the simulated price of stock i for the scenario s. In Table 
2.6, we have reported the values of the first ten historical scenarios*®. Using these scenarios, 
we can calculate the simulated price P;,, using the current price of the stocks ($109.33 
for Apple and $42.14 for Coca-Cola). For instance, in the case of the 9'* scenario, we 
obtain: 


Pis = 109.33 x (1—0.77%) = $108.49 
Py, = 42.14 x (1 — 1.04%) = $41.70 


39 For instance, the market risk factor for the first historical scenario and for Apple is calculated as follows: 


109.33 
ha = — 1 = —0.95% 
110.38 
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We then deduce the simulated mark-to-market MtM, (w) = g(R1,s, R2,s; w), the current 
value of the portfolio*® and the P&L II, (w). These data are given in Table 2.7. In addition 
to the first ten historical scenarios, we also report the results for the six worst cases and the 
last scenariot. We notice that the largest loss is reached for the 236* historical scenario at 
the date of 28 January 2014. If we rank the scenarios, the worst P&Ls are —84.34, —51.46, 
43.31, —40.75, —35.91 and —35.42. We deduce that the daily historical VaR is equal to: 


1 
VaRoow (w;one day) = 5 (51.46 + 43.31) = $47.39 


If we assume that me = 3, the corresponding capital charge represents 23.22% of the 


portfolio value: 
YeR = 3 x V10 x 47.39 = $449.54 


TABLE 2.6: Computation of the market risk factors R1,, and Ro,s 


Apple Coca-Cola 
x yee Price Riis Price Ros 
1 2015-01-02 109.33 —0.95% 42.14 —0.19% 
2 2014-12-31 110.38 —1.90% 42.22 —1.26% 
3 2014-12-30 112.52 —1.22% 42.76 —0.23% 
4 2014-12-29 113.91 —0.07% 42.86 —0.23% 
5 2014-12-26 113.99 1.77% 42.96 0.05% 
6 2014-12-24 112.01 —0.47% 42.94 —0.07% 
7 2014-12-23 112.54 —0.35% 42.97 1.46% 
8 2014-12-22 112.94 1.04% 42.35 0.95% 
9 2014-12-19 111.78 —0.77% 41.95 —1.04% 
10 2014-12-18 112.65 2.96% 42.39 2.02% 


Under Basel 2.5, we have to compute a second capital charge for the stressed VaR. If 
we assume that the stressed period is from 9 October 2007 to 9 March 2009, we obtain 
356 stressed scenarios. By applying the previous method, the six largest simulated losses 
aret? 219.20 (29/09/2008), 127.84 (17/09/2008), 126.86 (07/10/2008), 124.23 (14/10/2008), 
115.24 (23/01/2008) and 99.55 (29/09/2008). The 99% SVaR corresponds to the 3.56" order 
statistic. We deduce that: 


SVaRogy% (w;one day) = 126.86 + (3.56 — 3) x (124.23 — 126.86) 
$125.38 


It follows that: 
KSVAR — 3 x v10 x 125.38 = $1 189.49 


The total capital requirement under Basel 2.5 is then: 
Ki = KY®R + EVAR = $1639.03 


It represents 84.6% of the current mark-to-market! 


40 We have: 
P, (w) = 10 x 109.33 + 20 x 42.14 = $1936.10 


41 We assume that the value-at-risk is calculated using 250 historical scenarios (from 2015-01-02 to 2014- 
01-07). 
“We indicate in brackets the scenario day of the loss. 


70 Handbook of Financial Risk Management 


TABLE 2.7: Computation of the simulated P&L II, (w) 


Date oe ma na, MtM;, (w) Is (w) 
2015-01-02 —0.95% 108.29 —0.19% 42.06 1924.10 —12.00 
2014-12-31 —1.90% 107.25 —1.26% 41.61 1904.66 —31.44 
2014-12-30 —1.22% 108.00 —0.23% 42.04 1920.79 —15.31 
2014-12-29 —0.07% 109.25 —0.23% 42.04 1933.37 —2.73 
2014-12-26 1.77% 111.26 0.05% 42.16 1955.82 19.72 
2014-12-24 —0.47% 108.82 —0.07% 42.11 1930.36 —5.74 
2014-12-23 —0.35% 108.94 1.46% 42.76 1944.57 8.47 
2014-12-22 1.04% 110.46 0.95% 42.54 1955.48 19.38 
9 2014-12-19 —0.77% 108.49 —1.04% 41.70 1918.91 —17.19 
10 2014-12-18 2.96% 112.57 2.02% 42.99 1985.51 49.41 
-23 2014-12-01 —3.25% 105.78 -0.62% 41.88 1895.35 40.75 
69 2014-09-25 —3.81% 105.16 —1.16% 41.65 1884.64 —51.46 
85 2014-09-03 —4.22% 104.72 0.34% 42.28 1892.79 —43.31 
108 2014-07-31 —2.60% 106.49 —0.83% 41.79 1900.68 —35.42 
236 2014-01-28 —7.99% 100.59 0.36% 42.29 1851.76 —84.34 
242 2014-01-17 —2.45% 106.65 —1.08% 41.68 1900.19 —35.91 
-250 2014-01-07 —0.72% 108.55 0.30% 42.27 1930.79  —5.31 


H 


OANaw»kWN HE 


Remark 6 As the previous example has shown, directional exposures are highly penalized 
under Basel 2.5. More generally, it is not always evident that capital requirements are lower 


with IMA than with SMM (Crouhy et al., 2013). 


Since the expected shortfall is the expected loss beyond the value-at-risk, it follows that 
the historical expected shortfall is given by: 


lo a 
ES. (w: h) = —— X` 1 {L; > VaRa (w: h)} - Ls 
(w; h) A { (w; h)} 


or: 
ns 


S 1 {I < — VaRa (w; h)} - Hy 


ES, (w; h) = E: TA 


where qa (ng) = |n, (1 — a)] is the integer part of n, (1 — a). We deduce that: 


1 qa(ns) 
ESa Ww; h) =- Tein 
(w; h) fais) >, (ins) 


Computing the historical expected shortfall consists then in averaging the first qa (ns) 
order statistics of the P&L. For example, if ng is equal to 250 scenarios and a = 97.5%, we 
obtain ns (1 — a) = 6.25 and qa (ns) = 6. In Basel III, computing the historical ES is then 
equivalent to average the 6 largest losses of the 250 historical scenarios. In the table below, 
we indicate the value of qa (ng) for different values of ng and a: 


a/ng}100 150 200 250 300 350 400 450 500 1000 

90.0% | 9 14 19 24 29 34 39 44 49 99 

95.0% | 5 7 10 12 15 17 20 22 25 50 
97.5% | 2 3 5 6 7 8 10 11 12 25 
99.0% 1 1 2 2 3 3 4 4 5 10 
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Let us consider Example 13 on page 68. We have found that the historical value-at- 
risk VaRogy, (w; one day) of the Apple/Coca-Cola portfolio was equal to $47.39. The 99% 
expected shortfall is the average of the two largest losses: 


84.34 + 51.46 
ESogy (w; one day) = ee = $67.90 


However, the confidence level is set to 97.5% in Basel III, meaning that the expected shortfall 
is the average of the six largest losses: 
84.34 + 51.46 + 43.31 + 40.75 + 35.91 + 35.42 

6 


ESo7.5% (w;one day) = 


= $48.53 


2.2.2.2 The kernel approach 


Let {x1,...,%n} be a sample of the random variable X. In Section 10.1.4.1 on page 637, 
we show that we can estimate the empirical distribution F (x) = n~! 37", 1 {aj < x} by 


the kernel estimator: 
K lt L— Ti 
F =n T 
CR (5+) 


where Z is the integrated kernel function and h is the bandwidth. 


To estimate the value-at-risk with a confidence level a, Gouriéroux et al. (2000) solves 
the equation Fz (VaR. (w; h)) = a or: 


Ep eee) ee 


If we consider Example 13 on page 68 with the last 250 historical scenarios, we obtain 
the results given in Figure 2.6. We have reported the estimated distribution Fy of II (w) 
based on order statistic and Gaussian kernel methods**. We verify that the kernel approach 
produces a smoother distribution. If we zoom on the 1% quantile, we notice that the two 
methods give similar results. The daily VaR with the kernel approach is equal to $47.44 
whereas it was equal to $47.39 with the order statistic approach. 


For computing the non-parametric expected shortfall, we use the following result**: 


Therefore, Scaillet (2004) shows that the kernel estimator of the expected shortfall is equal 


to: 
a ~~ a sh — Il, 
Bsa (us) =- mr ( Yoke (i) ) 
s=1 


(l-—a)ng h 


In the case of the Apple/Coca-Cola example, we obtain ESg9y(w;h) = $60.53 and 
ESo7.5% (w;h) = $45.28. With the kernel approach, we can estimate the value-at-risk and 
the expected shortfall with a high confidence level a. For instance, if a = 99.25%, we have 
(1 — a)n, = 0.625 < 1. Therefore, it is impossible to estimate the VaR or the ES with 250 
observations, which is not the case with the kernel estimator. In our example, we obtain 
VaRogo.75% (w; h) = $58.27 and ES 99.75% (w; h) = $77.32. 


43We consider the Gaussian kernel defined by K (u) = ¢(u) and Z (u) = ®(u). The estimated standard 
deviation ô (IT) is equal to 17.7147, while the bandwidth is h = 1.364 x n7/ x ô (II) = 8.0027. 
44See Exercise 2.4.12 on page 124. 
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Empirical 


om: Kernel 


FIGURE 2.6: Kernel estimation of the historical VaR 


Remark 7 Monte Carlo simulations reveal that the kernel method reduces the variance of 
the VaR estimation, but not the variance of the ES estimation (Chen, 2007). In practice, 
the kernel approach gives similar figures than the order statistic approach, especially when 
the number of scenarios is large. However, the two estimators may differ in the presence of 
fat tails. For large confidence levels, the method based on order statistics seems to be more 
conservative. 


2.2.3 Analytical methods 
2.2.3.1 Derivation of the closed-form formula 


Gaussian value-at-risk We speak about analytical value-at-risk when we are able to 
find a closed-form formula of F7' (a). Suppose that L(w) ~ N (u (L) , o? (L)). In this case, 
we have Pr {L (w) < Fz‘ (a)} =a or: 


pr (20) E(D Picante a o (Fee) 


We deduce that: 


Fy. (a) - a (L) 


s = o) e F7? (a) = p (E) + ® (a) 0 (E) 


The expression of the value-at-risk is then*?: 


VaRa (w; h) = pw (L) + ®7' (a) o (L) (2.6) 


45We also have VaRa (w;h) = —p (II) + 7! (aœ) o (II) because the P&L I(x) is the opposite of the 
portfolio loss L (x) meaning that u (II) = —p (L) and o (II) = a (L). 
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This formula is known as the Gaussian value-at-risk. For instance, if a = 99% (resp. 95%), 
©~! (a) is equal to 2.33 (resp. 1.65) and we have: 


VaRo (w; h) = u (L) + 2.33 x o (L) 


Remark 8 We notice that the value-at-risk depends on the parameters u (L) and o (L). 
This is why the analytical value-at-risk is also called the parametric value-at-risk. In prac- 
tice, we don’t know these parameters and we have to estimate them. This implies that the 
analytical value-at-risk is also an estimator. For the Gaussian distribution, we obtain: 


VaRa (w; h) = A (L) + 87! (a) ô (L) 
In practice, it is extremely difficult to estimate the mean and we set ì (L) = 0. 


Example 14 We consider a short position of $1 mn on the S&P 500 futures contract. We 
estimate that the annualized volatility Ggpx is equal to 35%. Calculate the daily value-at-risk 
with a 99% confidence level. 


The portfolio loss is equal to L(w) = N x Rspx where N is the exposure amount 
(—$1 mn) and Rgpx is the (Gaussian) return of the S&P 500 index. We deduce that the 
annualized loss volatility is (ZL) = |N| x ôspx. The value-at-risk for a one-year holding 
period is: 

VaR gx, (w; one year) = 2.33 x 10° x 0.35 = $815 500 


By using the square-root-of-time rule, we deduce that: 
815 500 
y 260 


This means that we have a 1% probability to lose more than $50575 per day. 


VaRg9% (w; one day) = = $50 575 


In finance, the standard model is the Black-Scholes model where the price S; of the asset 
is a geometric Brownian motion: 


dS; = Ls St dt + osSt dW, 


and W, is a Wiener process. We can show that: 
1 
ln Sto — ln St = (us = 503) (t2 ie ty) + OSs (Wa y W) 


for tg > tı. We have Wa — Wan = vtz —tie where € ~ N (0,1). We finally deduce that 
var (In St, — ln Sn) = 0% (t2 — tı). Let Rg (At) be a sample of log-returns measured at a 
regular time interval At. It follows that: 

1 


ôs = VAt -a (Rs (At) 


If we consider two sample periods At and At’, we obtain the following relationship: 


1 
o (Rs (At) = y FÈ -o (Rs (At) 
For the mean, we have fig = At™! - E [Rs (At)] and E (Rs (At’)) = (At’/At) - E (Rs (At)). 
We notice that the square-root-of-time rule is only valid for the volatility and therefore for 
risk measures that are linear with respect to the volatility. In practice, there is no other 
solution and this explains why this rule continues to be used even if we know that the 
approximation is poor when the portfolio loss is not Gaussian. 
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Gaussian expected shortfall By definition, we have: 


ES, (w) = E[L(w)| L(w) > VaRa (w)| 
= 2 | zf (a) da 


a ee) 


where fz and F; are the density and distribution functions of the loss L (w). In the Gaussian 
case L (w) ~ N ((L) ,0? (L)), we have VaRq (w) = F7 (a) = u (L) + ~! (a) o (L) and: 


ops to (_1(2#-#)’) a, 
Esa (0) = Taf aao SE r( 1 a (L) ))« 


With the variable change t = ø (L)~* (x — u (L)), we obtain: 


HS, = a ig Ote (-30) di 
WD gay an ee ee 
= Ty Olea) + Ge ae »( st) at 


s (tp 2) | exp ( e) 


_ 7 a(L) x 1 -1 a)? 
= wn) Pel A (1°) 


The expected shortfall of the portfolio w is then: 


When the portfolio loss is Gaussian, the value-at-risk and the expected shortfall are both a 
standard deviation-based risk measure. They coincide when the scaling parameters Cyar = 
®-1 (avar) and cpg = $ (87! (ags)) / (1 — ars) are equal*®. In Table 2.8, we report the 
values taken by cyar and cpg. We notice that the 97.5% Gaussian expected shortfall is very 
close to the 99% Gaussian value-at-risk. 


TABLE 2.8: Scaling factors cyan and cpg 


a(in%) 95.0 96.0 97.0 97.5 98.0 98.5 99.0 99.5 
CVaR 1.64 1.75 1.88 1.96 2.05 2.17 2.33 2.58 
CES 2.06 2.15 2.27 2.34 2.42 2.52 2.67 2.89 


Remark 9 In the Gaussian case, the Basel III framework consists in replacing the scaling 
factor 2.33 by 2.34. In what follows, we focus on the VaR, because the ES figures can be 
directly deduced. 


46The equality is achieved when (avar, &ps) is equal to (90%, 75.44%), (95%, 87.45%), (99%, 97.42%), 
(99.9%, 99.74%), etc. 
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2.2.3.2 Linear factor models 


We consider a portfolio of n assets and a pricing function g which is linear with respect 
to the asset prices. We have: 


g (Fu w) = XO wPia 
i=1 
We deduce that the random P&L is: 


H(w) = Pn (w)-— Pr (w) 
= 5 wiPit+h — 5 wiPit 
i=l i=1 
= 5 wi (Pittn — Pit) 
i=l 


Here, P; «+ is known whereas P; t+n is stochastic. The first idea is to choose the factors as 
the future prices. The problem is that prices are far to be stationary meaning that we will 
face some issues to model the distribution Fy. Another idea is to write the future price as 
follows: 

Pritn = Pit (1+ Rittn) 


where Ri +47 is the asset return between t and t+ h. In this case, we obtain: 
n 
H (w) = 5 wiP; tRit+h 
i=1 


In this approach, the asset returns are the market risk factors and each asset has its own 
risk factor. 


The covariance model Let R; be the vector of asset returns. We note W; + = w;P;,¢ the 
wealth invested (or the nominal exposure) in asset i and W; = (W1 t, . . - , Wn,t). It follows 
that: 


H (w) = 5 WitRittn = W; Ripa 
i=1 
If we assume that Rin ~ N (u, ©), we deduce that u (I) = W, u and o? (N) = W,' EW.. 
Using Equation (2.6), the expression of the value-at-risk ist”: 


VaRa (w; h) = —W,' u + 671 (a) \/W, EW; 


In this approach, we only need to estimate the covariance matrix of asset returns to compute 
the value-at-risk. This explains the popularity of this model, especially when the P&L of 
the portfolio is a linear function of the asset returns*®. 


Let us consider our previous Apple/Coca-Cola example. The nominal exposures*® are 


$1 093.3 (Apple) and $842.8 (Coca-Cola). If we consider the historical prices from 2014-01- 
07 to 2015-01-02, the estimated standard deviation of daily returns is equal to 1.3611% for 


47For the expected shortfall formula, we replace ®—! (a) by ọ (a=: (a)) /(1-—a). 
48 For instance, this approach is frequently used by asset managers to measure the risk of equity portfolios. 
49These figures are equal to 10 x 109.33 and 20 x 42.14. 
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Apple and 0.9468% for Coca-Cola, whereas the cross-correlation is equal to 12.0787%. It 
follows that: 


ol) = WEW, 
1.3611 \7 0.9468 \ 7 
= J 3? a 42.82 —— 
093.3 (E) +8 x (Se) + 
12.0787 1.3611 0.9468 
1 : 42. —— x —— 
100 x 1093.3 x 842.8 x 100 x 100 
= 313.80 


If we omit the term of expected return —W,| u, we deduce that the 99% daily value-at-risk? 
is equal to $41.21. We obtain a lower figure than with the historical value-at-risk, which was 
equal to $47.39. We explain this result, because the Gaussian distribution underestimates 
the probability of extreme events and is not adapted to take into account tail risk. 


The factor model We consider the standard linear factor model where asset returns R; 
are related to a set of risk factors F; = (Fiz,..., Fm) in the following way: 


Ri = BF, + Et 


where E (F;) = u (F), cov (F+) = Q, E (ez) = 0 and cov (£+) = D. F; represents the common 
risks whereas ez is the vector of specific or idiosyncratic risks. This implies that F; and Et 
are independent and D is a diagonal matrix®'. B is a (n x m) matrix that measures the 
sensitivity of asset returns with respect to the risk factors. The first two moments of R; are 
given by: 


H= E [Ri] = Bu (F) 


and®?: 
5 = cov (R+) = BQB'+D 


If we assume that asset returns are Gaussian, we deduce that°?: 


VaRa (w; h) = —W," Bu (F) +87 (a) Y W7 (BOBT + D)W, 


The linear factor model plays a major role in financial modeling. The capital asset pricing 
model (CAPM) developed by Sharpe (1964) is a particular case of this model when there is 
a single factor, which corresponds to the market portfolio. In the arbitrage pricing theory 
(APT) of Ross (1976), F; corresponds to a set of (unknown) arbitrage factors. They may 
be macro-economic, statistical or characteristic-based factors. The three-factor model of 


50We have: 

VaRggy (w; one day) = 6~! (0.99) V313.80 = $41.21 
51Tn the following, we note D = diag (6?, e 6?) where G; is the idiosyncratic volatility of asset i. 
52We have: 


= = E|(R-u)(R-u)] 
= E[(B(Fi-u(F) +e) (B (Fi — u (F) +e))" 
= BE[(F -u (F)) (Fi-u (F))"] BY +E [ece? ] 
= BOB'+D 


— 


53For the expected shortfall formula, we replace 6-1 (a) by (a-t (a)) /(1-a). 
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Fama and French (1993) is certainly the most famous application of APT. In this case, the 
factors are the market factor, the size factor corresponding to a long/short portfolio between 
small stocks and large stocks and the value factor, which is the return of stocks with high 
book-to-market values minus the return of stocks with low book-to-market values. Since 
its publication, the original Fama-French factor has been extended to many other factors 
including momentum, quality or liquidity factors**. 

BCBS (1996a) makes direct reference to CAPM. In this case, we obtain a single-factor 
model: 

Ri =a+ BRm.t + Et 

where Rm is the return of the market and 8 = (1,..., Bn) is the vector of beta coefficients. 
Let om be the volatility of the market risk factor. We have var (Ri t) = 8202, + 6? and 
cov (Riz, Rj,z) = 6;8;07,. By omitting the mean, we obtain: 


VaRa (w;h) = 67" (a) |02, | >) 6? +250 BB, | + 90 wo? 
g=] 1 


j>i i= 


where ĝ; = Wi,tbi is the beta exposure of asset i expressed in $. With the previous formula, 
we can calculate the VaR due to the market risk factor by omitting the specific risk®°. 

If we consider our previous example, we can choose the S&P 500 index as the market 
risk factor. For the period 2014-01-07 to 2015-01-02, the beta coefficient is equal to 0.8307 
for Apple and 0.4556 for Coca-Cola, whereas the corresponding idiosyncratic volatilities 
are 1.2241% (Apple) and 0.8887% (Coca-Cola). As the market volatility is estimated at 
0.7165%, the daily value-at-risk is equal to $41.68 if we include specific risks. Otherwise, it 
is equal to $21.54 if we only consider the effect of the market risk factor. 


ooo 
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FIGURE 2.7: Cash flows of two bonds and two short exposures 


Application to a bond portfolio We consider a portfolio of bonds from the same issuer. 
In this instance, we can model the bond portfolio by a stream of nc coupons C (tm) with 
fixed dates tm > t. Figure 2.7 presents an example of aggregating cash flows with two 
bonds with a fixed coupon rate and two short exposures. We note B; (T) the price of a zero- 
coupon bond at time t for the maturity T. We have B; (T) = e~(7—9®*() where R, (T) is 
the zero-coupon rate. The sensitivity of the zero-coupon bond is: 


ð B (T) 
OR, (T) 


= (T t) Bı (T) 


54See Cazalet and Roncalli (2014) for a survey. 
55We set õ; to 0. 
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For a small change in yield, we obtain: 
AnBith (T) yy (T = t) B: (T) AnRith (T) 


The value of the portfolio is: 


We deduce that: 
(w) = Pryn(w)— P(w) 


5 C (tm) (Beth (tm) -B (tm)) 


Let us consider the following approximation: 


II (w) me 5 C (tm) (tm = t) B; (tm) Ahn Ri+h (tm) 


m=1 


no 
= 5 Wit, An Rt+n (tm) 


m=1 


where Wit„ = —C (tm) (tm — t) Bi (tm). This expression of the P&L is similar to this 
obtained with a portfolio of stocks. If we assume that the yield variations are Gaussian, the 
value-at-risk is equal to: 


VaRa (w; h) = —W,' u + 7! (a) / W; EW; 


where u and & are the mean and the covariance matrix of the vector of yield changes 
(Ag Fay (bt) 5x29 Ap cn Gig) 


Example 15 We consider an exposure on a US bond at 31 December 2014. The notional 
of the bond is 100 whereas the annual coupons are equal to 5. The remaining maturity is 
five years and the fixing dates are at the end of December. The number of bonds held in the 
portfolio is 10000. 


Using the US zero-coupon rates®®, we obtain the following figures for one bond at 31 
December 2014: 


tm—t C(tm) Riltm) Bi (tm) Ww, 


1 5 0.431% 0.996 —4.978 
2 5 0.879% 0.983 —9.826 
3 5 1.276% 0.962 —14.437 
4 5 1.569% 0.939 —18.783 
5 105 1.777% 0.915 —480.356 


At the end of December 2014, the one-year zero-coupon rate is 0.431%, the two-year zero- 
coupon rate is 0.879%, etc. We deduce that the bond price is $115.47 and the total exposure 
is $1154 706. Using the historical period of year 2014, we estimate the covariance matrix 


56The data comes from the Datastream database. The zero-coupon interest rate of maturity yy years and 
mm months corresponds to the code USyyYmm. 
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between daily changes of the five zero-coupon rates”. We deduce that the Gaussian VaR 
of the bond portfolio is equal to $4971. If the multiplicative factor me is set to 3, the 
required capital JCY@® is equal to $47 158 or 4.08% of the mark-to-market. We can compare 
these figures with those obtained with the historical value-at-risk. In this instance, the daily 
value-at-risk is higher and equal to $5 302. 


Remark 10 The previous analysis assumes that the risk factors correspond to the yield 
changes, meaning that the calculated value-at-risk only concerns interest rate risk. Therefore, 
it cannot capture all the risks if the bond portfolio is subject to credit risk. 


Defining risk factors with the principal component analysis In the previous para- 
graph, the bond portfolio was very simple with only one bond and one yield curve. In 
practice, the bond portfolio contains streams of coupons for many maturities and yield 
curves. It is therefore necessary to reduce the dimension of the VaR calculation. The un- 
derlying idea is that we don’t need to use the comprehensive set of zero-coupon rates to 
represent the set of risk factors that affects the yield curve. For instance, Nelson and Siegel 
(1987) propose a three-factor parametric model to define the yield curve. Another represen- 
tation of the yield curve has been formulated by Litterman and Scheinkman (1991), who 
have proposed to characterize the factors using the principal component analysis (PCA). 


Let X be the covariance matrix associated to the random vector X, of dimension n. We 
consider the eigendecomposition © = VAV! where A = diag (\1,...,An) is the diagonal 
matrix of eigenvalues with A; > A2 >... > An and V is an orthornormal matrix. In the 
principal component analysis, the (endogenous) risk factors are F, = V | X+. The reduction 
method by PCA consists in selecting the first m risk factors with m < n. When applied to 
the value-at-risk calculation, it can be achieved in two different ways: 


1. In the parametric approach, the covariance matrix © is replaced by }* = VA*V! 
where A* = diag (A1,...,Am,0,...,0). 


2. In the historical method, we only consider the first m PCA factors F} = 
(Fist,---;Fmt) or equivalently the modified random vector’ X% = VF? where 
FP =(FP Onin): 


If we apply this extracting method of risk factors to Example 15, the eigenvalues are 
equal to 47.299 x 108, 0.875 x 108, 0.166 x 108, 0.046 x 108, 0.012 x 10° whereas the matrix 
V of eigenvectors is: 


0.084 —0.375 —0.711 0.589 0.002 
0.303 —0.610 —0.215 —0.690 —0.114 
V = | 0470 —0.389 0.515 0.305 0.519 
0.567 0.103 0.195 0.223 —0.762 
0.599 0.570 —0.381 —0.183 0.371 


57The standard deviation is respectively equal to 0.746 bps for Ap Rz (t + 1), 2.170 bps for Ap Rs (t + 2), 
3.264 bps for Ap Re (t + 3), 3.901 bps for Ap R: (t + 4) and 4.155 bps for Ap Re (t + 5) where h corresponds 
to one trading day. For the correlation matrix, we get: 


100.000 
87.205 100.000 
p= 79.809 97.845 100.000 


75.584 95.270 98.895 100.000 
71.944 92.110 96.556 99.219 100.000 


583 Because we have V-! = V!. 
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We deduce that: 
Fi 4 = 0.084 x Ri (t + 1) + 0.303 x Ri (t+ 2) +---+0.599 x Fe (t4 


5) 


Fs = 0.002 x Ri (t+ 1) — 0.114 x Ri (t+ 2) +--+» +0.371 x R: (t+ 5) 


We retrieve the three factors of Litterman and Scheinkman, which are a level factor Fy 4, 
a slope factor Fz, and a convexity or curvature factor Fs +. In the following table, we 
report the incremental VaR of each risk factor, which is defined as difference between the 
value-at-risk including the risk factor and the value-at-risk excluding the risk factor: 


VaR Fit Fat F3t Fat Fst Sum 
Gaussian 4934.71 32.94 2.86 0.17 0.19 4970.87 
Historical 5857.39 —765.44 216.58 —7.98 1.41 5301.95 


We notice that the value-at-risk is principally explained by the first risk factor, that is the 
general level of interest rates, whereas the contribution of the slope and convexity factors 
is small and the contribution of the remaining risk factors is marginal. This result can be 
explained by the long-only characteristics of the portfolio. Nevertheless, even if we consider 
a more complex bond portfolio, we generally observe that a number of factors is sufficient to 
model all the risk dimensions of the yield curve. An example is provided in Figure 2.8 with 
a stream of long and short exposures®’. Using the period January 2014 — December 2014, 
the convergence of the value-at-risk is achieved with six factors. This result is connected to 
the requirement of the Basel Committee that “banks must model the yield curve using a 
minimum of six risk factors”. 


2.2.3.3 Volatility forecasting 


The challenge of the Gaussian value-at-risk is the estimation of the loss volatility or the 
covariance matrix of asset returns/risk factors. The issue is not to consider the best estimate 
for describing the past, but to use the best estimate for forecasting the loss distribution. In 
the previous illustrations, we use the empirical covariance matrix or the empirical standard 
deviation. However, other estimators have been proposed by academics and professionals. 


The original approach implemented in RiskMetrics used an exponentially weighted mov- 


ing average (EWMA) for modeling the covariance between asset returns®?: 


Š; = A1 +AA) RaRa 


where the parameter  € [0, 1] is the decay factor, which represents the degree of weighting 
decrease. Using a finite sample, the previous estimate is equivalent to a weighted estimator: 


ns 
Ses ) ws Ris Rls 


s=1 
where: TEE 
_ = s—1 
we = (1 — Xs) 


In Figure 2.9, we represent the weights ws for different values of A when the number ng of 
historical scenarios is equal to 250. We verify that this estimator gives more importance to 


59We have Cr (t + 1/2) = 400, Ct (t + 1) = 300, Cr (t + 3/2) = 200, Ct (t + 2) = —200, Cy (t +3) = —300, 
Cr (t + 4) = —500, Cy (t + 5) = 500, Cz (t + 6) = 400, Ct (t +7) = —300, Ce (t + 10) = —700, Cy (t + 10) = 
300 and C; (t + 30) = 700. 

60We assume that the mean of expected returns is equal to 0. 
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FIGURE 2.8: Convergence of the VaR with PCA risk factors 


weights corresponds to the twelve first observations and the half-life is 16.7 days. We also 
observe that the case \ = 1 corresponds to the standard covariance estimator with uniform 


the current values than to the past values. For instance, if A is equal to 0.9461, 50% of the 
weights. 


Another approach to model volatility in risk management is to consider that the volatility 
is time-varying. In 1982, Engle introduced a class of stochastic processes in order to take 


into account the heteroscedasticity of asset returns®?: 


et ~N (0,1) 


= Otet and 


Et 


where 


Rit = bit €t 


satisfies the following equation: 


2 
t 


The time-varying variance hy 


q 


2 
t— 


+ gE 


gt: 


2 
t 


_1 + Q2E 


2 
t 


Qo + &1E 


hi 


> 0. We note that the conditional variance of €+ is not constant 


where a; > 0 for all 7 


A substantial impact on the asset return R; 4 implies 


an increase of the conditional variance of €+}; at time t + 1 and therefore an increase of 


and depends on the past values of €+. 


known as ARCH models (Autoregressive Conditional 


the volatility is persistent, which is a well-known stylized fact in finance (Chou, 1988). 
, has been extended by Bollerslev (1986) 


the probability to observe another substantial impact on Ri 441. Therefore, this means that 


This type of stochastic processes, 


Heteroscedasticity) 


in the following way: 
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1 Qi = 1, we may show that the process e? has a unit 


In this case, the conditional variance depends also on its past values and we obtain a 
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611¢ was the original value of the RiskMetrics system (J.P. Morgan, 1996). 
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GARCH(p,q) model. If $} 


62See Section 10.2.4.1 on page 664 for a comprehensive presentation of ARCH and GARCH models. 
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FIGURE 2.9: Weights of the EWMA estimator 


root and the model is called an integrated GARCH (or IGARCH) process. If we neglect the 
constant term, the expression of the IGARCH(1,1) process is hy = (1 — a) hy-1 + 0R?,_4 
or equivalently: 


o? = (1-a) ofa F aR? 


This estimator is then an exponentially weighted moving average with a factor À equal to 
l-a. 

In Figure 2.10, we have reported the annualized volatility of the S&P 500 index estimated 
using the GARCH model (first panel). The ML estimates of the parameters are 4 = 0.8954 
and âı = 0.0929. We verify that this estimated model is close to an IGARCH process. In 
the other panels, we compare the GARCH volatility with the empirical one-year historical 
volatility, the EWMA volatility (with 4 = 0.94) and a short volatility based on 20 trading 
days. We observe large differences between the GARCH volatility and the one-year historical 
volatility, but the two others estimators (EWMA and short volatility) give similar results to 
the GARCH estimator. To compare the out-of-sample forecasting accuracy of these different 
models, we consider respectively a long and a short exposure on the S&P 500 index. At time 
t, we compute the value-at-risk for the next day and we compare this figure with the realized 
mark-to-market. Table 2.9 show the number of exceptions per year for the different models: 
(1) GARCH(1,1) model, (2) Gaussian value-at-risk with a one-year historical volatility, (3) 
EWMA model with A = 0.94, (4) Gaussian value-at-risk with a twenty-day short volatility 
and (5) historical value-at-risk based on the last 260 trading days. We observe that the 
GARCH model produces the smallest number of exceptions, whereas the largest number 
of exceptions occurs in the case of the Gaussian value-at-risk with the one-year historical 
volatility. We also notice that the number of exceptions is smaller for the short exposure 
than for the long exposure. This is due to the asymmetry of returns, because extreme 
negative returns are larger than extreme positive returns on average. 
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FIGURE 2.10: Comparison of GARCH and EWMA volatilities 


TABLE 2.9: Number of exceptions per year for long and short exposures on the S&P 500 
index 
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2.2.3.4 Extension to other probability distributions 


The Gaussian value-at-risk has been strongly criticized because it depends only on the 
first two moments of the loss distribution. Indeed, there is a lot of evidence that asset returns 
and risk factors are not Gaussian (Cont, 2001). They generally present fat tails and skew 
effects. It is therefore interesting to consider alternative probability distributions, which are 
more appropriate to take into account these stylized facts. 

Let ur = E[(X — E[X])"] be the r-order centered moment of the random variable X. 
The skewness yı = [U3/ us! ? is the measure of the asymmetry of the loss distribution. If 
yı < 0 (resp. yı > 0), the distribution is left-skewed (resp. right-skewed) because the left 
(resp. right) tail is longer. For the Gaussian distribution, yı is equal to zero. To characterize 
whether the distribution is peaked or flat relative to the normal distribution, we consider the 
excess kurtosis y2 = u4/u2—3. If y2 > 0, the distribution presents heavy tails. In the case of 
the Gaussian distribution, y2 is exactly equal to zero. We have illustrated the skewness and 
kurtosis statistics in Figure 2.11. Whereas we generally encounter skewness risk in credit and 
hedge fund portfolios, kurtosis risk has a stronger impact in equity portfolios. For example, 
if we consider the daily returns of the S&P 500 index, we obtain an empirical distribution’ 
which has a higher kurtosis than the fitted Gaussian distribution (Figure 2.12). 
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FIGURE 2.11: Examples of skewed and fat tailed distributions 


An example of fat-tail distributions is the Student’s t probability distribution. If X ~ ty, 
we have E [|X] = 0 and var (X) = v/ (v — 2) for v > 2. Because X has a fixed mean and 
variance for a given degrees of freedom, we need to introduce location and scale parameters 
to model the future loss L (w) = + wX. To calculate the value-at-risk, we proceed as in 
the Gaussian case. We have: 


Pr {L (w) < F7 (a)} =a Pr{ X < TOE a 


631t is estimated using the kernel approach. 
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FIGURE 2.12: Estimated distribution of S&P 500 daily returns (2007-2014) 


We deduce that: 


v) =a $ F} (a) =£4+T' (a;v)w 
w 
In practice, the parameters £ and w are estimated by the method of moments®*. We finally 
deduce that: 
y—2 
v 


VaRa (w; h) = u (L) + T™ (a; v) (L) 


Let us illustrate the impact of the probability distribution with Example 13. By using 
different values of v, we obtain the following daily VaRs: 


v | 3.00 3.50 4.00 5.00 6.00 10.00 1000 oo 
10.23 11.60 12.53 13.72 14.46 15.84 17.70 17.71 
VaRa (w;h) | 46.44 47.09 46.93 46.17 45.46 43.79 41.24 41.21 


If v > oo, we verify that the Student’s t value-at-risk converges to the Gaussian value-at- 
risk ($41.21). If the degrees of freedom is equal to 4, it is closer to the historical value-at-risk 
($47.39). 


We can derive closed-form formulas for several probability distributions. However, most 
of them are not used in practice, because these methods are not appealing from a professional 
point of view. Nevertheless, one approach is very popular among professionals. Using the 
Cornish-Fisher expansion of the normal distribution, Zangari (1996) proposes to estimate 
the value-at-risk in the following way: 


VaR (w; h) = u (L) +3 (a; 71 (L) , 72 (L)) x o (L) (2.7) 


64We have E[€ + wX] = € and var (£ + wX) = (w?v) /(v— 2). 
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where: 


1 1 1 
3 (05.71, 92) = Za + 6 (22 -Uyut z4 (23 — 32a) %2 — 36 (223 — 52a) 17 (2.8) 


and Zza = ®~! (qa). This is the same formula as the one used for the Gaussian value-at-risk 
but with another scaling parameter®°. In Equation (2.7), the skewness and excess kurtosis 
coefficients are those of the loss distribution®°. 


TABLE 2.10: Value of the Cornish-Fisher quantile 3 (99%; 71, Y2) 


Y2 
X 0.00 1.00 2.00 3.00 4.00 5.00 6.00 7.00 
—2.00 0.99 
—1.00 1.68 1.92 2.15 2.38 2.62 2.85 
—0.50 2.10 2.33 2.57 2.80 3.03 3.27 3.50 
0.00 | 2.33 2.56 2.79 3.03 3.26 3.50 3.73 3.96 
0.50 2.83 3.07 3.30 3.54 3.77 4.00 4.24 
1.00 3.15 3.39 3.62 3.85 4.09 4.32 
2.00 3.93 


Table 2.10 shows the value of the Cornish-Fisher quantile 3 (99%; y1, Y2) for different 
values of skewness and excess kurtosis. We cannot always calculate the quantile because 
Equation (2.8) does not define necessarily a probability distribution if the parameters yı 
and 72 does not satisfy the following condition (Maillard, 2018): 


On) ae Myf OM |\ 4 a Bg 
3 za = 9 8 6 8 | 36) — 


We have reported the domain of definition in the third panel in Figure 2.13. For instance, 
Equation (2.8) is not valid if the skewness is equal to 2 and the excess kurtosis is equal to 3. 
If we analyze results in Table 2.10, we do not observe that there is a monotone relationship 
between the skewness and the quantile. To understand this curious behavior, we report the 
partial derivatives of 3 (a; 71,72) with respect to yı and y2 in Figure 2.13. We notice that 
their signs depend on the confidence level a, but also on the skewness for 0,, 3 (a; 71, 72)- 
Another drawback of the Cornish-Fisher approach concerns the statistical moments, which 
are not necessarily equal to the input parameters if the skewness and the kurtosis are 
not close to zero®’. Contrary to what professionals commonly think, the Cornish-Fisher 
expansion is therefore difficult to implement. 


When we consider other probability distribution than the normal distribution, the dif- 
ficulty concerns the multivariate case. In the previous examples, we directly model the loss 


65If y1 = y2 = 0, we retrieve the Gaussian value-at-risk because 3 (a;0,0) = #7! (a). 

S6Tf we prefer to use the moments of the P&L, we have to consider the relationships yı (L) = —71 (II) 
and y2 (L) = 72 (TD). 

67Let Z be a Cornish-Fisher random variable satisfying F~! (œa) = 3(a;71,72). A direct application of 
the result in Appendix A.2.2.3 gives: 


aa 
E[Z"] ai 3” (a; 71,72) da 
0 


Using numerical integration, we can show that 71 (Z) # J1 and %2 (Z) Æ y2 if yı and y2 are large enough 
(Maillard, 2018). 
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FIGURE 2.13: Derivatives and definition domain of the Cornish-Fisher expansion 


distribution, that is the reduced form of the pricing system. To model the joint distribu- 
tion of risk factors, two main approaches are available. The first approach considers copula 
functions and the value-at-risk is calculated using the Monte Carlo simulation method (see 
Chapters 11 and 13). The second approach consists in selecting a multivariate probability 
distribution, which has some appealing properties. For instance, it should be flexible enough 
to calibrate the first two moments of the risk factors and should also include asymmetry 
(positive and negative skewness) and fat tails (positive excess kurtosis) in a natural way. In 
order to obtain an analytical formula for the value-at-risk, it must be tractable and verify 
the closure property under affine transformation. This implies that if the random vector X 
follows a certain class of distribution, then the random vector Y = A+ BX belongs also to 
the same class. These properties reduce dramatically the set of eligible multivariate prob- 
ability distributions, because the potential candidates are mostly elliptical distributions. 
Such examples are the skew normal and ¢ distributions presented in Appendix A.2.1 on 
page 1057. 


Example 16 We consider a portfolio of three assets and assume that their annualized re- 
turns follows a multivariate skew normal distribution. The location parameters are equal 
to 1%, —2% and 15% whereas the scale parameters are equal to 5%, 10% and 20%. The 
correlation parameters to describe the dependence between the skew normal variables are 
given by the following matris: 


1.00 
C= | 0.35 1.00 
0.20 —0.50 1.00 


The three assets have different skewness profiles, and the shape parameters are equal to 0, 
10 and —15.50. 
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FIGURE 2.14: Skew normal and ¢ distributions of asset returns 


In Figure 2.14, we have reported the density function of the three asset returns®*. The 
return of the first asset is close to be Gaussian whereas the two other assets exhibit respec- 
tively negative and positive skews. Moments are given in the table below: 


Asset i pi (in %) o:i (in %) ni Ni 


1 1.07 5.00 0.00 0.00 
2 4.36 7.72 0.24 0.13 
3 0.32 13.58 —0.54 0.39 


Let us consider the nominal portfolio w = ($500, $200, $300). The annualized P&L II (w) is 
equal to w! R where R ~ SN (£€,Q,7). We deduce that H (w) ~ SN (Ew, Ww, Mw) with Ew = 
46.00, ww = 66.14 and nw = —0.73. We finally deduce that the one-year 99% value-at-risk is 
equal to $123.91. If we use the multivariate skew ¢ distribution in place of the multivariate 
skew normal distributions to model asset returns and if we use the same parameter values, 
the one-year 99% value-at-risk becomes $558.35 for v = 2, $215.21 for v = 5 and $130.47 for 
v = 50. We verify that the skew t value-at-risk converges to the skew normal value-at-risk 
when the number of degrees of freedom v tends to +00. 


The choice of the probability distribution is an important issue and raises the question 
of model risk. In this instance, the Basel Committee justifies the introduction of the penalty 
coefficient in order to reduce the risk of a wrong specification (Stahl, 1997). For example, 
imagine that we calculate the value-at-risk with a probability distribution F while the true 
probability distribution of the portfolio loss is H. The multiplication factor m, defines then 
a capital buffer such that we are certain that the confidence level of the value-at-risk will 
be at least equal to a: 


Pr{L (w) < me : VaR® (w)} > a (2.9) 
r 


Capital 


68We also show the density function in the case of the skew t distribution with v = 1 and v = 4. 
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This implies that H (me -VaR®) (w)) > a and me : VaR® (w) > H`! (a). We finally 
deduce that: 
VaR (w) 
MeZ eB w) 
VaR® (w) 


In the case where F and H are the normal and Student’s t distributions, we obtain®’: 


v — 2 T7! (a) 


mey T @-T(a) 


Below is the lower bound of me for different values of a and v. 


ajv 3 4 5 6 10 50 100 
90% 0.74 0.85 0.89 0.92 0.96 0.99 1.00 
95% 1.13 1.14 1.12 1.10 1.06 1.01 1.01 
99% 1.31 1.26 1.21 1.18 1.10 1.02 1.01 
99.9% 1.91 1.64 1.48 1.38 1.20 1.03 1.02 
99.99% 3.45 2.48 2.02 1.76 1.37 1.06 1.03 


For instance, we have me > 1.31 when a = 99% and v = 3. 


Stahl (1997) considers the general case when F is the normal distribution and H is 
an unknown probability distribution. Let X be a given random variable. The Chebyshev’s 
inequality states that: 

Pr{(|X — u (X)| > k-o (X))} Sk? 


for any real number k > 0. If we apply this theorem to the value-at-risk, we obtain”: 


1 
l-a 


Pe{ E(u) < oW} >0 


Using Equation (2.9), we deduce that: 


B 1 o (L) 
meV I-a VaR® (w) 


a 


In the case of the normal distribution, we finally obtain that the multiplicative factor is: 


1 1 
Me = — 1/ —— 
° @-l(a)V 1l-a 
This ratio is the multiplication factor to apply in order to be sure that the confidence 
level of the value-at-risk is at least equal to a if we use the normal distribution to model 
the portfolio loss. In the case where the probability distribution is symmetric, this ratio 


becomes: 
© 1 1 
Me = 8-1 (a) V 2—2a 


In Table 2.11, we report the values of m, for different confidence levels. If a is equal to 99%, 
the multiplication factor is equal to 3.04 if the distribution is symmetric and 4.30 otherwise. 


69We recall that the Gaussian value-at-risk is equal to #71 (a) ø (L) whereas the Student’s t value-at-risk 


is equal to 4/ (v — 2) /v- T7! (a)o(L). 


70We set a=1—k7?. 
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TABLE 2.11: Value of the multiplication factor me deduced from the Chebyshev’s in- 
equality 


a (in %) 90.00 95.00 99.00 99.25 99.50 99.75 99.99 
Symmetric 1.74 1.92 3.04 3.36 3.88 5.04 19.01 
Asymmetric 2.47 2.72 430 4.75 549 7.12 26.89 


Remark 11 Even if the previous analysis justifies the multiplication factor from a statis- 
tical point of view, we face two main issues. First, the multiplication factor assumes that 
the bank uses a Gaussian value-at-risk. It was the case for many banks in the early 1990s, 
but they use today historical value-at-risk measures. Some have suggested that the multipli- 
cation factor has been introduced in order to reduce the difference in terms of regulatory 
capital between SMM and IMA and it is certainly the case. The second issue concerns the 
specificity of the loss distribution. For many positions like long-only unlevered portfolios, 
the loss is bounded. If we use a Gaussian value-at-risk, the regulatory capital satisfies’! 
K = KYR + KSYeR > 13.98 - o (L) where o(L) is the non-stressed loss volatility. This 
implies that the value-at-risk is larger than the portfolio value if o (L) > 7.2%! There is a 
direct contradiction here. 


2.2.4 Monte Carlo methods 
In this approach, we postulate a given probability distribution H for the risk factors: 


(Fittn; rer »Fm,t+h) ~H 


Then, we simulate ng scenarios of risk factors and calculate the simulated P&L II, (w) 
for each scenario s. Finally, we estimate the risk measure (VaR/ES) by the method of or- 
der statistics. The Monte Carlo method to calculate the VaR/ES is therefore close to the 
historical method. The only difference is that it uses simulated scenarios instead of histor- 
ical scenarios. This implies that the Monte Carlo approach is not limited by the number 
of scenarios. By construction, the Monte Carlo VaR/ES is also similar to the analytical 
VaR/ES, because they both specify the parametric probability distribution of risk factors. 
In summary, we can say that: 


e the Monte Carlo VaR/ES is a historical VaR/ES with simulated scenarios; 


e the Monte Carlo VaR/ES is a parametric VaR/ES for which it is difficult to find an 
analytical formula. 


Let us consider Example 16 on page 87. The expression of the P&L is: 
II (w) = 500 x Ry + 200 x Ry + 300 x R3 


Because we know that the combination of the components of a skew normal random vector 
is a skew normal random variable, we were able to compute the analytical quantile of II (w) 
at the 1% confidence level. Suppose now that we don’t know the analytical distribution of 
II (w). We can repeat the exercise by using the Monte Carlo method. At each simulation s, 
we generate the random variates (R1 s, Ro,.,R3,,) such that: 


(Ris, Ras, R3,s) ~ SN (E, Q, n) 


Tl Because we have 2 x me X 2.33 > 13.98. 
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and the corresponding P&L I, (w) = 500 x R1,s +200 x R2 s +300 x Rss. The Monte Carlo 
value-at-risk is the n, (1 — a)” order statistic: 


VaR a (ns) = Win, (1—a):ns) (w) 


Using the law of large numbers, we can show that the MC estimator converges to the exact 
VaR: _ 
lim VaRa (ng) = VaRa 
ng—oo 

In Figure 2.15, we report four Monte Carlo runs with 10 000 simulated scenarios. We notice 
that the convergence of the Monte Carlo VaR to the analytical VaR is slow’, because asset 
returns present high skewness. The convergence will be faster if the probability distribution 
of risk factors is close to be normal and has no fat tails. 


First replication Second replication 


a 0H OO 

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 
Number of simulations x 104 Number of simulations x 104 
Third replication Fourth replication 


0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.1 0.2 03 0.4 0.5 0.6 0.7 0.8 09 1.0 
Number of simulations x 104 Number of simulations x 104 


FIGURE 2.15: Convergence of the Monte Carlo VaR when asset returns are skew normal 


Remark 12 The Monte Carlo value-at-risk has been extensively studied with heavy-tailed 
risk factors (Dupire, 1998; Eberlein et al., 1998; Glasserman et al., 2002). In those cases, 
one needs to use advanced and specific methods to reduce the variance of the estimator’. 


Example 17 We use a variant of Example 15 on page 78. We consider that the bond is 
exposed to credit risk. In particular, we assume that the current default intensity of the bond 
issuer is equal to 200 bps whereas the recovery rate is equal to 50%. 


In the case of a defaultable bond, the coupons and the notional are paid until the issuer 
does not default whereas a recovery rate is applied if the issuer defaults before the maturity 


72We have previously found that the exact VaR is equal to $123.91. 
73These techniques are presented in Chapter 13. 
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of the bond. If we assume that the recovery is paid at maturity, we can show that the bond 
price under default risk is: 


B= 5 C (tm) Bt (tm) St (tm) + NB: (T) (Si (T) + Re (1 — Sz (T))) 


tm>t 


where S; (tm) is the survival function at time tm and R+ is the current recovery rate. 
We retrieve the formula of the bond price without default risk if S; (tm) = 1. Using the 
numerical values of the parameters, the bond price is equal to $109.75 and is lower than 
the non-defaultable bond price™. If we assume that the default time is exponential with 
S; (tm) = e7 ~ (tmt), we have: 


Pan = Xo C (ta) elim—t—-h) Ren (tm) e—Attn (tm —t-h) si 
tmt 
NelT-t-h)Ri+n(T) (Res ii Rin) ene) 


We define the risk factors as the zero-coupon rates, the default intensity and the recovery 
rate: 


Rith (tm) ed Ri (tm) + AnRtth (tm) 


Ath = Att AnAt+n 
Ritn = Rit AnRisn 


We assume that the three risk factors are independent and follow the following probability 
distributions: 


(Ag tg (tı) pees AnRitn (tn)) emt N (0, 2) 
Anràt4n ~N (0,03) 
AnReth ~ Uja,b) 


We can then simulate the daily P&L II (w) = w (Pi+n — P) using the above specifications. 
For the numerical application, we use the covariance matrix given in Footnote 57 whereas 
the values of c), a and b are equal to 20 bps, —10% and 10%. In Figure 2.16, we have 
estimated the density of the daily P&L using 100000 simulations. IR, corresponds to the 
case when risk factors are only the interest rates”. The case IR/S considers that both 
Rı (tm) and Az are risk factors whereas R+ is assumed to be constant. Finally, we include 
the recovery risk in the case IR/S/RR. Using 10 million simulations, we find that the daily 
value-at-risk is equal to $4730 (IR), $13 460 (IR/S) and $18360 (IR/S/RR). We see the 
impact of taking into account default risk in the calculation of the value-at-risk. 


2.2.5 The case of options and derivatives 


Special attention should be paid to portfolios of derivatives, because their risk man- 
agement is much more complicated than a long-only portfolio of traditional assets (Duffie 
and Pan, 1997). They involve non-linear exposures to risk factors that are difficult to mea- 
sure, they are sensitive to parameters that are not always observable and they are generally 
traded on OTC markets. In this section, we provide an overview of the challenges that arise 
when measuring and managing the risk of these assets. Chapter 9 complements it with a 
more exhaustive treatment of hedging and pricing issues as well as model risk. 


74We recall that it was equal to $115.47. 
75This implies that we set Ahàt+h and Ah R:i+;, to zero in the Monte Carlo procedure. 
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FIGURE 2.16: Probability density function of the daily P&L with credit risk 


2.2.5.1 Identification of risk factors 


Let us consider an example of a portfolio containing wg stocks and woe call options on 
this stock. We note S; and C; the stock and option prices at time t. The P&L for the holding 
period h is equal to: 


I (w) = ws (Sttn — St) + wo (Ctpn — Cr) 


If we use asset returns as risk factors, we get: 


I (w) = ws SiRst+n + woCiRot+n 


where Rs,+h and Rot+n are the returns of the stock and the option for the period |t, t + h]. 
In this approach, we identify two risk factors. The problem is that the option price C; is a 
non-linear function of the underlying price S;: 

Ct = fo (St) 


This implies that: 


II(w) = wsS:Rsttn+we (fo (St+n) — Ce) 
= wsS:iRst+nt+we (fo (S: (1+ Rst+n)) — Cz) 


The P&L depends then on a single risk factor Rs. We notice that we can write the return 
of the option price as a non-linear function of the stock return: 


_ fo (S (1 + Rst+n)) -Ce 
C 
The problem is that the probability distribution of Rç is non-stationary and depends on 


the value of S+. Therefore, the risk factors cannot be the random vector (Rs, Rc) because 
they require too complex modeling. 


Roth 
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Risk factors are often explicit in primary financial assets (equities, bonds, currencies), 
which is not the case with derivatives. Previously, we have identified the return of the 
underlying asset as a risk factor for the call option. In the Black-Scholes model, the price 
of the call option is given by: 


Crs (Si, K, Xe, T, bi, r£) = Spee) ® (d1) — Ke-"*7 @ (da) (2.10) 


where S; is the current price of the underlying asset, K is the option strike, 4; is the 
volatility parameter, T is the maturity date, b; is the cost-of-carry’® and r; is the interest 
rate. The parameter T = T — t is the time to maturity whereas the coefficients dı and d2 
are defined as follows: 


2 St 1 
dı = Tae (mz T br) T z%tVT 


dy — Dev 


We can then write the option price as follows: 


d2 


C: = fes (Ocontract 0) 


where Ocontract are the parameters of the contract (strike K and maturity T) and 6 are 
the other parameters than can be objective as the underlying price S+ or subjective as the 
volatility X+. Any one of these parameters 0 may serve as risk factors: 


e S, is obviously a risk factor; 
e if 4; is not constant, the option price may be sensitive to the volatility risk; 
e the option may be impacted by changes in the interest rate or the cost-of-carry. 


The risk manager faces here a big issue, because the risk measure will depend on the 
choice of the risk factors’’. A typical example is the volatility parameter. We observe a 
difference between the historical volatility 6, and the Black-Scholes volatility X+. Because 
this implied volatility is not a market price, its value will depend on the option model and 
the assumptions which are required to calibrate it. For instance, it will be different if we 
use a stochastic volatility model or a local volatility model. Even if two banks use the same 
model, they will certainly obtain two different values of the implied volatility, because there 
is little possibility that they exactly follow the same calibration procedure. 


With the underlying asset S;, the implied volatility X; is the most important risk factor, 
but other risk factors may be determinant. They concern the dividend risk for equity options, 
the yield curve risk for interest rate options, the term structure for commodity options or 
the correlation risk for basket options. In fact, the choice of risk factors is not always obvious 
because it is driven by the pricing model and the characteristics of the option. We will take 
a closer look at this point in Chapter 9. 


2.2.5.2 Methods to calculate VaR and ES risk measures 


The method of full pricing To calculate the value-at-risk or the expected shortfall of 
option portfolios, we use the same approaches as previously. The difference with primary 


76The cost-of-carry depends on the underlying asset. We have b = rz for non-dividend stocks and total 
return indices, bg = rt — dt for stocks paying a continuous dividend yield d, b: = 0 for forward and futures 
contracts and by = r+ — ry for foreign exchange options where rj is the foreign interest rate. 

TTWe encounter the same difficulties for pricing and hedging purposes. 
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financial assets comes from the pricing function which is non-linear and more complex. 
In the case of historical and Monte Carlo methods, the P&L of the st scenario has the 
following expression: 

Il, (w) = 9 (Fi s,- ., Fm,s; w) — P; (w) 


In the case of the introducing example, the P&L becomes then: 


Tt, (ui) = wsS:R,+ wo (fo (St (1+ Rs); 4) —C,) with one risk factor 
j wsSiRs + we (fo (St (1+ Rs), £s)— Ct) with two risk factors 


where R, and X, are the asset return and the implied volatility generated by the st? scenario. 
If we assume that the interest rate and the cost-of-carry are constant, the pricing function 
is: 

fo (S; 2) = CBs (S, K,u,T — h, biri) 


and we notice that the remaining maturity of the option decreases by h days. In the model 
with two risk factors, we have to simulate the underlying price and the implied volatility. 
For the single factor model, we use the current implied volatility 4, instead of the simulated 
value },. 


Example 18 We consider a long position on 100 call options with strike K = 100. The 
value of the call option is $4.14, the residual maturity’? is 52 days and the current price of 
the underlying asset is $100. We assume that ©; = 20% and bk = rı = 5%. The objective is to 
calculate the daily value-at-risk with a 99% confidence level and the daily expected shortfall 
with a 97.5% confidence level. For that, we consider 250 historical scenarios, whose first 
nine values are the following: 


s 1 2 3 4 5 6 7 8 9 
Rs 1.93 0.69 0.71 0.73 1.22 1.01 1.04 1.08 —1.61 
A» 4.42 1.32 3.04 2.88 —0.13 —0.08 1.29 2.93 0.85 


TABLE 2.12: Daily P&L of the long position on the call option when the risk factor is 
the underlying price 


S R; (in %) Stith Cih II, 

1 —1.93 98.07 3.09 —104.69 
2 —0.69 99.31 3.72 —42.16 
3 —0.71 99.29 3.71 —43.22 
4 —0.73 99.27 3.70 —44.28 
5 1.22 101.22 4.81 67.46 
6 
7 
8 
9 


1.01 101.01 4.68 54.64 
1.04 101.04 4.70 56.46 
1.08 101.08 4.73 58.89 
—1.61 98.39 3.25 —89.22 


Using the price and the characteristics of the call option, we can show that the implied 
volatility X+ is equal to 19.99% (rounded to 20%). We first consider the case of the single 
risk factor. In Table 2.12, we show the values of the P&L for the first nine scenarios. As an 
illustration, we provide the detailed calculation for the first scenario. The asset return R, 


78We assume that there are 252 trading days per year. 
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is equal to —1.93%, thus implying that the asset price St+n is equal to 100 x (1 — 1.93%) = 
98.07. The residual maturity T is equal to 51/252 years. It follows that: 
1 98.07 51 1 51 
d = —— | ] 5% x —— ~ x 20% A 
eama 100 * x a) +9 * ox Y 252 
—0.0592 


II 


and: 


51 
d = —0.0592 — 20% x 352 —0.1491 


We deduce that: 


Citn = 98.07 x €%-5%) 282 x & (—0.0592) — 100 x e252 x & (—0.1491) 
= 98.07 x 1.00 x 0.4764 — 100 x 1.01 x 0.4407 
3.093 


The simulated P&L for the first historical scenario is then equal to: 
II, = 100 x (3.093 — 4.14) = —104.69 


Based on the 250 historical scenarios, the 99% value-at-risk is equal to $154.79, whereas the 
97.5% expected shortfall is equal to $150.04. 


Remark 13 In Figure 2.17, we illustrate that the option return Rc is not a new risk factor. 
We plot Rs against Rc for the 250 historical scenarios. The points are on the curve of the 
Black-Scholes formula. The correlation between the two returns is equal to 99.78%, which 
indicates that Rg and Re are highly dependent. However, this dependence is non-linear 
for large positive or negative asset returns. The figure shows also the leverage effect of the 
call option, because Ro is not of the same order of magnitude as Rg. This illustrates the 
non-linear characteristic of options. A linear position with a volatility equal to 20% implies 
a daily VaR around 3%. In our example, the VaR is equal to 37.4% of the portfolio value, 
which corresponds to a linear exposure in a stock with a volatility of 259%! 


Let us consider the case with two risk factors when the implied volatility changes from 
t to t+ h. We assume that the absolute variation of the implied volatility is the right risk 
factor to model the future implied volatility. It follows that: 


Leth = Le + ADS 


In Table 2.13, we indicate the value taken by Xt+n for the first nine scenarios. This allows 
us to price the call option and deduce the P&L. For instance, the call option becomes”? 
$2.32 instead of $3.09 for s = 1 because the implied volatility has decreased. Finally, the 
99% value-at-risk is equal to $181.70 and is larger than the previous one due to the second 
risk factor®®. 


The method of sensitivities The previous approach is called full pricing, because it 
consists in re-pricing the option. In the method based on the Greek coefficients, the idea is 
to approximate the change in the option price by the Taylor expansion. For instance, we 
define the delta approach as follows®*!: 


Cin —Ce ~ Ay (Sth a St) 


79We have dı = —0.0986, d2 = —0.1687, © (d1) = 0.4607, © (d2) = 0.4330 and C,4), = 2.318. 
80For the expected shortfall, we have ESg7.5% (w; one day) = $172.09. 
81We write the call price as the function Cgs (St, £+, T). 
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4 
Rs (in 7) 


+ Historical scenarios 
= Black-Scholes formula 
===: OLS regression 


FIGURE 2.17: Relationship between the asset return Rgs and the option return Re 


where A; is the option delta: 


_ 8Cps (Se Xe T) 


A 
: aS, 


5 
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This approximation consists in replacing the non-linear exposure by a linear exposure with 
respect to the underlying price. As noted by Duffie and Pan (1997), this approach is not 
satisfactory because it is not accurate for large changes in the underlying price that are 
the most useful scenarios for calculating the risk measure. The delta approach may be 
implemented for the three VaR/ES methods. For instance, the Gaussian VaR of the call 


option is: 


VaRa (w; h) = p-! (a) x |A,| x St x Oo (Rs,t+h) 


TABLE 2.13: Daily P&L of the long position on the call option when the risk factors are 


the underlying price and the implied volatility 


S Rs (in %) Stin As (in %) Dith Crth II, 

1 —1.93 98.07 —4.42 15.58 2.32 —182.25 
2 —0.69 99.31 —1.32 18.68 3.48  —65.61 
3 —0.71 99.29 —3.04 16.96 3.17 —97.23 
4 —0.73 99.27 2.88 22.88 4.21 6.87 
5 1.22 101.22 —0.13 19.87 4.79 65.20 
6 1.01 101.01 —0.08 19.92 4.67 53.24 
7 1.04 101.04 1.29 21.29 4.93 79.03 
8 1.08 101.08 2.93 22.93 5.24 110.21 
9 —1.61 98.39 0.85 20.85 3.40 —74.21 
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whereas the Gaussian ES of the call option is: 


¢(®* (a) 


ESq (w; h) = E 


x |A: x St x o (Rg t+h) 


If we consider the introductory example, we have: 


II(w) = wg (St4n— St) + we (Cin — Ct) 
> (ws oe woAt) (Sith = St) 
F (ws SF wot) SiRgt+n 


With the delta approach, we aggregate the risk by netting the different delta exposures®?. 


In particular, the portfolio is delta neutral if the net exposure is zero: 
ws + wcA = 0 & ws = —wcA; 


With the delta approach, the VaR/ES of delta neutral portfolios is then equal to zero. 


20, į 
p / 
L / fe 
? Re-pricing 1f- 
L TE IA 
6L Delta A 
| ===: Delta-~gamma A P 
aL . Re-pricing (h = 30 days) ida 7 


Ctth 


FIGURE 2.18: Approximation of the option price with the Greek coefficients 


To overcome this drawback, we can use the second-order approximation or the delta- 
gamma approach: 


1 
Cran — Cy ~ Ae (Sten — St) + git (Sth — Sir 


where T, is the option gamma: 


_ & Ces (Se, Za, T) 


T 
i as? 


82A long (or short) position on the underlying asset is equivalent to A; = 1 (or Ay = —1). 
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In Figure 2.18, we compare the two Taylor expansions with the re-pricing method when h is 
equal to one trading day. We observe that the delta approach provides a bad approximation 
if the future price St+n is far from the current price S;. The inclusion of the gamma helps 
to correct the pricing error. However, if the time period h is high, the two approximations 
may be inaccurate even in the neighborhood de S; (see the case h = 30 days in Figure 2.18). 
It is therefore important to take into account the time or maturity effect: 


1 
Cith —Cy > A, (Star — S+) + git (Stn — Si)? + Qh 


where ©; = ô; Cps (Si, Xr, T) is the option theta’. 


The Taylor expansion can be generalized to a set of risk factors Fy = (Fi2,...,Fm,t): 
m ac 
Cth = Cy 2 aa (Fjt+h _ Fije) + 
gat ae 
1AA PG 
= == (F; — Fj) (F , — F] 
5 2 2. IF, OF as (Fjt+h — Fijt) Fees kt) 


The delta-gamma-theta approach consists in considering the underlying price and the ma- 
turity as risk factors. If we add the implied volatility as a new risk factor, we obtain: 


1 
Crtn—-Ce ~ At (Sin — St) + gre (Sith — S+)? + Oih + 
vi (Dern — Ve) 


where v; = Oy, Ces (St, 44,7) is the option vega. Here, we have considered that only the 
second derivative of C; with respect to S+ is significant, but we could also include the vanna 
or volga effect**. 


In the case of the call option, the Black-Scholes sensitivities are equal to: 


A; = eT) (d1) 
elbe—re)T d (d1) 
T; = C o. e o 
StXty/T 
: 1 
©, = —rKe-™’® (d2) — SXpe*—"9" (d1) — 


2/7 
(bi = rt) Sret) TD (dı) 
v, = eT" S/T (di) 


If we consider again Example 18 on page 95, we obtain A, = 0.5632, T; = 0.0434, 
©, = —11.2808 and v; = 17.8946. In Table 2.14, we have reported the approximated P&Ls 
for the first nine scenarios and the one-factor model. The fourth column indicates the P&L 
obtained by the full pricing method, which were already reported in Table 2.12. HA (w), 
HAH (w) and II4+*+® (w) correspond respectively to delta, delta-gamma, delta-gamma- 
theta approaches. For example, we have II (w) = 100 x 0.5632 x (98.07 — 100) = —108.69, 


TAHT (w) = —108.69 + 100 x $ x 0.0434 x (98.07 — 100)? = —100.61 and IAtTt® (w) = 


83 An equivalent formula is ©; = —Or Cps (St, Xt, T) = —Or Cpg (St, X+, T) because the maturity T (or 
the time to maturity 7) is moving in the opposite way with respect to the time t. 

84The vanna coefficient corresponds to the cross-derivative of C+ with respect to S; and X+ whereas the 
volga effect is the second derivative of C+ with respect to Xz. 

85We have di = 0.1590, ® (di) = 0.5632, ¢ (d1) = 0.3939, d2 = 0.0681 and ® (d2) = 0.5272. 


100 


—100.61 — 11.2808 x 1/252 = —105.09. We notice that we obtain a good approximation 
with the delta, but it is more accurate to combine delta, gamma and theta sensibilities. 
Finally, the 99% VaRs for a one-day holding period are $171.20 and $151.16 and $155.64. 
This is the delta-gamma-theta approach which gives the closest result®°. If the set of risk 
factors includes the implied volatility, we obtain the results in Table 2.15. We notice that 
the vega effect is very significant (fifth column). As an illustration, we have IT? (w) = 
100 x 17.8946 x (15.58% — 20%) = —79.09, implying that the volatility risk explains 43.4% 
of the loss of $182.25 for the first scenario. Finally, the VaR is equal to $183.76 with the 
delta-gamma-theta-vega approach whereas we found previously that it was equal to $181.70 


Handbook of Financial Risk Management 


with the full pricing method. 


TABLE 2.14: Calculation of the P&L based on the Greek sensitivities 


s Rs(in%) Sin I, m4 mA  yA+r+e 
1 —193 98.07 —10469 —108.69 100.61 —105.09 
2 -069 9931 -42.16  —38.86  —37.83 42.30 
3 -0.71 99.29 —43.22 -39.98 —38.89 43.37 
4  -0.73 99.27 44.28 -41.11 39.96 44.43 
5 1.22 10122 6746 68.71 71.93 67.46 
6 1.01 101.01 5464 56.88 59.09 54.61 
7 1.04 101.04 5646 58.57 60.91 56.44 
8 1.08 101.08 58.89 60.82 63.35 58.87 
9 -161 9839  —89.22 -90.67 85.05  —89.53 
VaRoox (w; one day) 154.79 171.20 151.16 155.64 
ESo7_.5y (w; one day) 150.04 165.10 146.37 150.84 


TABLE 2.15: Calculation of the P&L using the vega coefficient 


s Stith Veta Il, TY Te? eer Ath +0+v 
1 98.07 15.58 —182.25 —79.09 —187.78 —179.71 —184.19 
2 99.31 18.68 —65.61 —23.62  —62.48  —61.45 —65.92 
3 99.29 16.96 —97.23 —54.40  —94.38  —93.29 —97.77 
4 99.27 22.88 6.87 51.54 10.43 11.58 7.10 
5 101.22 19.87 65.20 —2.33 66.38 69.61 65.13 
6 101.01 19.92 53.24 —1.43 55.45 57.66 53.18 
7 101.04 21.29 79.03 23.08 81.65 84.00 79.52 
8 101.08 22.93 110.21 52.43 113.25 115.78 111.30 
9 98.39 20.85 —74.21 15.21 —7546 —69.84 —74.32 
VaRgoy (w; one day) 181.70 77.597 190.77 179.29 183.76 
ESo7.5% (w; one day) 172.09 73.90 184.90 169.34 173.81 


Remark 14 We do not present here the non-linear quadratic VaR, which consists in com- 
puting the VaR of option portfolios with the Cornish-Fisher expansion (Zangari, 1996; 
Britten-Jones and Schaefer, 1999). It is called ‘quadratic’ because it uses the delta-gamma 
approximation and requires calculating the moments of the quadratic form (Si+n — SA. 


The treatment of this approach is left as Exercise 2.4.8 on page 123. 


86We found previously that the VaR. was equal to $154.79 with the full pricing method. 


Market Risk 101 


The hybrid method On the one hand, the full pricing method has the advantage to 
be accurate, but also the drawback to be time-consuming because it performs a complete 
revaluation of the portfolio for each scenario. On the other hand, the method based on the 
sensitivities is less accurate, but also faster than the re-pricing approach. Indeed, the Greek 
coefficients are calculated once and for all, and their values do not depend on the scenario. 
The hybrid method consists of combining the two approaches: 


1. we first calculate the P&L for each (historical or simulated) scenario with the method 
based on the sensitivities; 


2. we then identify the worst scenarios; 
3. we finally revalue these worst scenarios by using the full pricing method. 


The underlying idea is to consider the faster approach to locate the value-at-risk, and then 
to use the most accurate approach to calculate the right value. 


TABLE 2.16: The 10 worst scenarios identified by the hybrid method 


Full pricing Greeks 
i A-T-®-v;i A-© | A-0-v 

s II, s II, ls II, | s Il, 
1 | 100 —183.86|/100 —186.15 , 182 —187.50, 134 —202.08 
2 1 —182.25| 1 —184.19 '169 —176.80 ' 100 —198.22 
3 | 134 181.15] 134 -183.34 | 27 174.55; 1 192.26 
4 | 27 163.01] 27 —164.26 '134 -170.05'169 —184.32 
5 |169 162.82] 169 164.02 | 69 —157.66 | 27 —184.04 
6 |194 —159.46 | 194 —160.93 !108 —150.90 ! 194 —175.36 
7 | 49 —150.25 | 49 —151.43 |194 —149.77, 49 —165.41 
8 |245 —145.43 | 245 —146.57 | 49 —147.52 | 182 —164.96 
9 |182 —142.21 | 182 —142.06 | 186 —145.27| 245 —153.37 
10 | 79 —135.55 | 79 —136.52 100 —137.38! 69 —150.68 


In Table 2.16, we consider the previous example with the implied volatility as a risk 
factor. We have reported the worst scenarios corresponding to the order statistic i : ng 
with i < 10. In the case of the full pricing method, the five worst scenarios are the 100", 
1st, 134%, 27 and 169'®. This implies that the hybrid method will give the right result 
if it is able to select the 100", 1% and 134*® scenarios to compute the value-at-risk which 
corresponds to the average of the second and third order statistics. If we consider the 
A-T — © — v approximation, we identify the same ten worst scenarios. It is perfectly 
normal, as it is easy to price an European call option. It will not be the case with exotic 
options, because the approximation may not be accurate. For instance, if we consider our 
example with the A — © approximation, the five worst scenarios becomes the 182*", 169**, 
27» 134*» and 69". If we revaluate these 5 worst scenarios, the 99% value-at-risk is equal 
to: 


1 
VaRooy (w; one day) = 5 (163.01 + 162.82) = $162.92 


which is a result far from the value of $180.70 found with the full pricing method. With the 
10 worst scenarios, we obtain: 


VaRog% (w;one day) = = (181.15 + 163.01) = $172.08 


NI rR 
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Once again, we do not find the exact value, because the A—© approximation fails to detect 
the first scenario among the 10 worst scenarios. This problem vanishes with the A — O — v 
approximation, even if it gives a ranking different than this obtained with the full pricing 
method. In practice, the hybrid approach is widespread and professionals generally use the 


identification method with 10 worst scenarios®’. 


2.2.5.3 Backtesting 


When we consider a model to price a product, the valuation is known as ‘mark-to- 
model’ and requires more attention than the mark-to-market approach. In this last case, 
the simulated P&L is the difference between the mark-to-model value at time t+ 1 and the 
current mark-to-market value: 


Hs (w)= Pri(w) -— P(w) 


mark-to-model mark-to-market 


At time t+ 1, the realized P&L is the difference between two mark-to-market values: 


H (w) = Pii (w) - P(w) 
mark-to-market mark-to-market 


For exotic options and OTC derivatives, we don’t have market prices and the portfolio is 
valuated using the mark-to-model approach. This means that the simulated P&L is the 
difference between two mark-to-model values: 


IM, (w)= P(w) — Pw) 


mark-to-model mark-to-model 


and the realized P&L is also the difference between two mark-to-model values: 


H(w)= Pui(w) - P(w) 
mark-to-model mark-to-model 


In the case of the mark-to-model valuation, we see the relevance of the pricing model in 
terms of risk management. Indeed, if the pricing model is wrong, the value-at-risk is wrong 
too and this cannot be detected by the backtesting procedure, which has little signification. 
This is why the supervisory authority places great importance on model risk. 


2.2.5.4 Model risk 


Model risk cannot be summarized in a unique definition due to its complexity. For 
instance, Derman (1996, 2001) considers six types of model risk (inapplicability of modeling, 
incorrect model, incorrect solutions, badly approximated solution, bugs and unstable data). 
Rebonato (2001) defines model risk as “the risk of a significant difference between the mark- 
to-model value of an instrument, and the price at which the same instrument is revealed to 
have traded in the market”. According to Morini (2001), these two approaches are different. 
For Riccardo Rebonato, there is not a true value of an instrument before it will be traded on 
the market. Model risk can therefore be measured by selling the instrument in the market. 
For Emanuel Derman, an instrument has an intrinsic true value, but it is unknown. The 
proposition of Rebonato is certainly the right way to define model risk, but it does not 
help to measure model risk from an ex-ante point of view. Moreover, this approach does 


87Tts application is less frequent than in the past because computational times have dramatically decreased 
with the evolution of technology, in particular the development of parallel computing. 
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not distinguish between model risk and liquidity risk. The conception of Derman is more 
adapted to manage model risk and calibrate the associated provisions. This is the approach 
that has been adopted by banks and regulators. Nevertheless, the multifaceted nature of 
this approach induces very different implementations across banks, because it appears as a 
catalogue with an infinite number of rules. 


We consider a classification with four main types of model risk: 
1. the operational risk; 
the parameter risk; 


the risk of mis-specification; 


Be Oe BS 


the hedging risk. 


The operational risk is the risk associated to the implementation of the pricer. It concerns 
programming mistakes or bugs, but also mathematical errors in closed-form formulas, ap- 
proximations or numerical methods. A typical example is the use of a numerical scheme 
for solving a partial differential equation. The accuracy of the option price and the Greek 
coefficients will depend on the specification of the numerical algorithm (explicit, implicit or 
mixed scheme) and the discretization parameters (time and space steps). Another example 
is the choice of the Monte Carlo method and the number of simulations. 


The parameter risk is the risk associated to the input parameters, in particular those 
which are difficult to estimate. A wrong value of one parameter can lead to a mis-pricing, 
even though the model is right and well implemented. In this context, the question of 
available and reliable data is a key issue. It is particularly true when the parameters are 
unobservable and are based on an expert’s opinion. A typical example concerns the value of 
correlations in multi-asset options. Even if there is no problem with data, some parameters 
are indirectly related to market data via a calibration set. In this case, they may change 
with the specification of the calibration set. For instance, the pricing of exotic interest rate 
options is generally based on parameters calibrated from prices of plain vanilla instruments 
(caplets and swaptions). The analysis of parameter risk consists then of measuring the 
impact of parameter changes on the price and the hedging portfolio of the exotic option. 


The risk of mis-specification is the risk associated to the mathematical model, because 
it may not include all risk factors, the dynamics of the risk factors is not adequate or the 
dependence between them is not well defined. It is generally easy to highlight this risk, 
because various models calibrated with the same set of instruments can produce different 
prices for the same exotic option. The big issue is to define what is the least bad model. For 
example, in the case of equity options, we have the choice between many models: Black- 
Scholes, local volatility, Heston model, other stochastic volatility models, jump-diffusion, 
etc. In practice, the frontier between the risk of parameters and the risk of mis-specification 
may be unclear as shown by the seminal work of uncertainty on pricing and hedging by 
Avellaneda et al. (1995). Moreover, a model which appears to be good for pricing may not 
be well adapted for risk management. This explains that the trader and the risk manager 
can use sometimes two different models for the same option payoff. 


The hedging risk is the risk associated to the trading management of the option portfolio. 
The sales margin corresponds to the difference between the transaction price and the mark- 
to-model price. The sales margin is calculated at the inception date of the transaction. To 
freeze the margin, we have to hedge the option. The mark-to-model value is then transferred 
to the option trader and represents the hedging cost. We face here the risk that the realized 
hedging cost will be larger than the mark-to-model price. A typical example is a put option, 
which has a negative delta. The hedging portfolio corresponds then to a short selling on 
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the underlying asset. Sometimes, this short position may be difficult to implement (e.g. 
a ban on short selling) or may be very costly (e.g. due to a change in the bank funding 
condition). Some events may also generate a rebalancing risk. The most famous example is 
certainly the hedge fund crisis in October 2008, which has imposed redemption restrictions 
or gates. This caused difficulties to traders, who managed call options on hedge funds and 
were unable to reduce their deltas at this time. The hedging risk does not only concern 
the feasibility of the hedging implementation, but also its adequacy with the model. As an 
illustration, we suppose that we use a stochastic volatility model for an option, which is 
sensitive to the vanna coefficient. The risk manager can then decide to use this model for 
measuring the value-at-risk, but the trader can also prefer to implement a Black-Scholes 
hedging portfolio®®. This is not a problem that the risk manager uses a different model than 
the trader if the model risk only includes the first three categories. However, it will be a 
problem if it also concerns hedging risk. 

In the Basel III framework, the Basel Committee highlights the role of the model vali- 
dation team: 


“A distinct unit of the bank that is separate from the unit that designs and 
implements the internal models must conduct the initial and ongoing validation 
of all internal models used to determine market risk capital requirements. The 
model validation unit must validate all internal models used for purposes of the 
IMA on at least an annual basis. [...] Banks must maintain a process to ensure 
that their internal models have been adequately validated by suitably qualified 
parties independent of the model development process to ensure that each model 
is conceptually sound and adequately reflects all material risks. Model validation 
must be conducted both when the model is initially developed and when any 
significant changes are made to the model” (BCBS, 2019, pages 68-69). 


Therefore, model risk justifies that model validation is an integral part of the risk man- 
agement process for exotic options. The tasks of a model validation team are multiple and 
concern reviewing the programming code, checking mathematical formulas and numerical 
approximations, validating market data, testing the calibration stability, challenging the 
pricer with alternative models, proposing provision buffers, etc. This team generally oper- 
ates at the earliest stages of the pricer development (or when the pricer changes), whereas 
the risk manager is involved to follow the product on a daily basis. In Chapter 9, we present 
the different tools available for the model validation unit in order to assess the robustness 
of risk measures that are based on mark-to-model prices. 


Remark 15 It is a mistake to think that model risk is an operational risk. Model risk is 
intrinsically a market risk. Indeed, it exists because exotic options are difficult to price and 
hedge, implying that commercial risk is high. This explains that sales margins are larger 
than for vanilla options and implicitly include model risk, which is therefore inherent to the 
business of exotic derivatives. 


2.3 Risk allocation 


Measuring the risk of a portfolio is a first step to manage it. In particular, a risk measure 
is a single number that is not very helpful for understanding the sources of the portfolio risk. 


88There may be many reasons for implementing more simple hedging portfolios: the trader may be more 
confident in the robustness, there is no market instrument to replicate the vanna position, etc. 
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To go further, we must define precisely the notion of risk contribution in order to propose 
risk allocation principles. 


Let us consider two trading desks A and B, whose risk measure is respectively R (wa) 
and R (wg). At the global level, the risk measure is equal to R (w4+pB). The question is 
then how to allocate R (w4+B) to the trading desks A and B: 


R (wa+B) = RCa (wate) + RCB (wats) 


There is no reason that RC 4 (w4+B) = R(wa) and RCg (waits) = R (wg) except if there 
is no diversification. This question is an important issue for the bank because risk allocation 
means capital allocation: 


K (wa+B) = Ka (w4+B) + Kp (ways) 


Capital allocation is not neutral, because it will impact the profitability of business units 
that compose the bank. 


Remark 16 This section is based on Chapter 2 of the book of Roncalli (2013). 


2.3.1 Euler allocation principle 


According to Litterman (1996), risk allocation consists in decomposing the risk portfolio 
into a sum of risk contributions by sub-portfolios (assets, trading desks, etc.). The concept 
of risk contribution is key in identifying concentrations and understanding the risk profile of 
the portfolio, and there are different methods for defining them. As illustrated by Denault 
(2001), some methods are more pertinent than others and the Euler principle is certainly 
the most used and accepted one. 


We decompose the P&L as follows: 


where II; is the P&L of the it sub-portfolio. We note R (II) the risk measure associated 
with the P&L®°. Let us consider the risk-adjusted performance measure (RAPM) defined 
by?" 


z [I] 
RAPM (II) = — 

=z (i) 

Tasche (2008) considers the portfolio-related RAPM of 


the it? sub-portfolio defined by: 


[IL] 
RAPM (I; | I) = == 
T e 
Based on the notion of RAPM, Tasche (2008) states two properties of risk contributions 
that are desirable from an economic point of view: 


1. Risk contributions R (II; | II) to portfolio-wide risk R (II) satisfy the full allocation 
property if: 


SRM | Il) = R (I) (2.11) 


i=1 


89We recall that R (I) = R (—L). 
90 This concept is close to the RAROC measure introduced by Banker Trust (see page 2). 
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2. Risk contributions R (Il; | II) are RAPM compatible if there are some e; > 0 such 
that": 


RAPM (II; | ID) > RAPM (IT) = RAPM (II + AL) > RAPM (II) (2.12) 
forall 0 < h < €i. 


Tasche (2008) shows therefore that if there are risk contributions that are RAPM compatible 
in the sense of the two previous properties (2.11) and (2.12), then R (II; | II) is uniquely 
determined as: à 
R (IL | I) = —R (I + AI;) (2.13) 
dh A= 
and the risk measure is homogeneous of degree 1. In the case of a subadditive risk measure, 


one can also show that: 
R (IL | II) < R (Il;) (2.14) 
This means that the risk contribution of the sub-portfolio 7 is always smaller than its stand- 
alone risk measure. The difference is related to the risk diversification. 
Let us return to risk measure R (w) defined in terms of weights. The previous framework 
implies that the risk contribution of sub-portfolio 7 is uniquely defined as: 
ƏR (w) 


and the risk measure satisfies the Euler decomposition: 


n 


ee u g -YRC (2.16) 


This relationship is also called the Euler allocation principle. 


Remark 17 We can always define the risk contributions of a risk measure by using Equa- 
tion (2.15). However, this does not mean that the risk measure satisfies the Euler decompo- 
sition (2.16). 


Remark 18 Kalkbrener (2005) develops an axiomatic approach to risk contribution. In 
particular, he shows that the Euler allocation principle is the only risk allocation method 
compatible with diversification principle (2.14) if the risk measure is subadditive. 


If we assume that the portfolio return R(w) is a linear function of the weights w, the 
expression of the standard deviation-based risk measure becomes: 


R (w) —u(w)+ c-o (w) 


—w' u +c- Vwi dw 


where u and X are the mean vector and the covariance matrix of sub-portfolios. It follows 
that the vector of marginal risks is: 


OR (w) l; s+ =ï 
Dw = SUPE (w' Ew) (22w) 
Lw 
= Ca = 
í Vw Ew 


91 This property means that assets with a better risk-adjusted performance than the portfolio continue to 
have a better RAPM if their allocation increases in a small proportion. 


Market Risk 107 


The risk contribution of the itè sub-portfolio is then: 
(Dw), ) 
RC; = Wi: Paes L 
( j Vwi Ew 


We verify that the standard deviation-based risk measure satisfies the full allocation prop- 
erty: 


7 J ot < A E f =o) 
2 = dw ( bi +e JatSa 


= wt ( +c: ef ) 

Vwi tw 
= -wlpte- Vw dw 
= R(w) 


Because Gaussian value-at-risk and expected shortfall are two special cases of the stan- 
dard deviation-based risk measure, we conclude that they also satisfy the Euler allocation 
principle. In the case of the value-at-risk, the risk contribution becomes: 


RC; = wi- (~n an o-! (a) ` 22) (2.17) 


whereas in the case of the expected shortfall, it is equal to: 


$ (®-(a)) Ew); 
(1-a) Vw! Ew 


Remark 19 Even if the risk measure is convex, it does not necessarily satisfy the Eu- 
ler allocation principle. The most famous example is the variance of the portfolio return. 
We have var (w) = w'Xw and Oy, var (w) = 2Xw. It follows that 7), wi: Ow, var (w) = 
Xia wi: (22w); = 2w! £w = 2var (w) > var (w). In the case of the variance, the sum of 
the risk contributions is then always larger than the risk measure itself, because the variance 
does not satisfy the homogeneity property. 


RC; = wi: ( Hi t (2.18) 


Example 19 We consider the Apple/Coca-Cola portfolio that has been used for calculating 
the Gaussian VaR on page 68. We recall that the nominal exposures were $1 093.3 (Apple) 
and $842.8 (Coca-Cola), the estimated standard deviation of daily returns was equal to 
1.3611% for Apple and 0.9468% for Coca-Cola and the cross-correlation of stock returns 
was equal to 12.0787%. 


In the two-asset case, the expression of the value-at-risk or the expected shortfall is: 


R (w) = =w Hı — w22 + cy wo? + 2wiwepoid2 + W202 


It follows that the marginal risk of the first asset is: 


Wi of + W2N0102 


MR, = -p14 


c 
ywo? + 2w1w2p0102 + w302 
We then deduce that the risk contribution of the first asset is: 


2-2 

wyo7 + WwW W2p010 
= 191 1W2P0192 
RC, = —wipti +c 


ywo? + 2wiwepo10 + w202 
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By using the numerical values’? of Example 19, we obtain the results given in Tables 2.17 
and 2.18. We verify that the sum of risk contributions is equal to the risk measure. We 
notice that the stock Apple explains 75.14% of the risk whereas it represents 56.47% of the 
allocation. 


TABLE 2.17: Risk decomposition of the 99% Gaussian value-at-risk 
Apple 1093.3 2.83% 30.96 75.14% 
Coca-Cola 842.8 1.22% 10.25 24.86%- 
R (w) 41.21 


TABLE 2.18: Risk decomposition of the 99% Gaussian expected shortfall 


Asset Wi MRi RC; RC* 

Apple 1093.3 3.24% 35.47 75.14% 
Coca-Cola 842.8 1.39% 11.74 24.86% 
i R (w) 7 7 o 47.21 p 


2.3.2 Application to non-normal risk measures 
2.3.2.1 Main results 


In the previous section, we provided formulas for when asset returns are normally dis- 
tributed. However, the previous expressions can be extended in the general case. For the 
value-at-risk, Gouriéroux et al. (2000) show that the risk contribution is equal to®?: 


RC; = R(I; |T) 
= —E[I; |I = — VaR (ID) 
= E[L; | L (w) = VaRa (L)] (2.19) 


Formula (2.19) is more general than Equation (2.17) obtained in the Gaussian case. Indeed, 
we can retrieve the latter if we assume that the returns are Gaussian. We recall that the 
portfolio return is R (w) = X; wiR; = w" R. The portfolio loss is defined by L (w) = 
—R (w). We deduce that: 


RC; = Ef|-w:Ri|-—R(w) = VaRa (w;h)] 
= —wiE[|R; | R(w) = — VaRa;n (w)] 


Because R (w) is a linear combination of R, the random vector (R, R(w)) is Gaussian and 


we have: 
(rio JAC afn Core uzu )) 


92 We set Hı = u2 = 0. 
93See also Hallerbach (2003). 
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We know that VaRa (w; h) = ~w! u + 0-1 (a) Vw! dw. It follows that: 


[R| R(w) = — VaRa (w: h)] j [R | R(w) =w u -07 (a) VuT Zu] 


= pt+trUw (w'Sw) 


(wy — (a) Vw Ew — wp) 


and: 


[R| R(w) = — VaRa (w;h)]) = p— 7! (a) Ew— 


We finally obtain the same expression as Equation (2.17): 


>; 


wi: (uw), 
= —Wiki + Qo! a) ——— 
P (a wl Sw 


In the same way, Tasche (2002) shows that the general expression of the risk contribu- 
tions for the expected shortfall is: 


RC; = R( |T) 
= -E[N; |I < —VaR, (II)] 
= E[L;|L(w) > VaRa (L)] (2.20) 


Using Bayes’ theorem, it follows that: 


[Lj -1 {L (w) > VaRa (L)} 
l-a 


RC; = 


If we apply the previous formula to the Gaussian case, we obtain: 


RC; = = — E[R; -1 {R (w) < — VaR (L)}] 


l-a 


After some tedious computations, we retrieve the same expression as found previously®”. 


2.3.2.2 Calculating risk contributions with historical and simulated scenarios 


The case of value-at-risk When using historical or simulated scenarios, the VaR is 
calculated as follows: 


VaRa (w; h) = Ii- ongina) = Lianging) 
Let Ry (s) be the rank of the P&L associated to the stè observation meaning that: 


Rn (s) = 5 a(t < I} 


j=1 


94We use the formula of the conditional expectation presented in Appendix A.2.2.4 on page 1062. 
95 The derivation of the formula is left as an exercise (Section 2.4.9 on page 123). 
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We deduce that: 
Il, = Ury(s):ns) 


Formula (2.19) is then equivalent to decompose I(1—a)ns:ns) into individual P&Ls. We 
have II, = X; Ili,s where I; s is the P&L of the it” sub-portfolio for the st” scenario. It 
follows that: 


VaRa (w; h) = Ili- ajnsina) 


~HUg-t(1-a)ns) 
= -$ I, nzia-a)ns) 
{=l 


where Rg is the inverse function of the rank. We finally deduce that: 


RO, = Mi 951((1-a)ns) 
Li z1 (a-a)ns) 

The risk contribution of the it sub-portfolio is the loss of the i*” sub-portfolio corresponding 

to the scenario RẸ’ ((1 — a) ns). If (1 — a) ng is not an integer, we have: 


RC, =— (Tas +((1-a)ns-q) (T aciga z Tasio) 


where q = qa (ns) is the integer part of (1 — a) ns. 

Let us consider Example 13 on page 68. We have found that the historical value-at-risk 
is $47.39. It corresponds to the linear interpolation between the second and third largest 
loss. Using results in Table 2.7 on page 70, we notice that RẸ’ (1) = 236, RT’ (2) = 69, 
RT (3) = 85, RI (4) = 23 and RI’ (5) = 242. We deduce that the second and third order 
statistics correspond to the 69t? and 85‘ historical scenarios. The risk decomposition is 
reported in Table 2.19. Therefore, we calculate the risk contribution of the Apple stock as 
follows: 


1 
RC, = -z (Hanso + Ii,s5) 


1 
=~, (10 x (105.16 — 109.33) + 10 x (104.72 — 109.33)) 
= $43.9 


For the Coca-Cola stock, we obtain: 


RC2 = — = (Ilee9 + I2 85) 


(20 x (41.65 — 42.14) + 20 x (42.28 — 42.14)) 


If we compare these results with those obtained with the Gaussian VaR, we observe that 
the risk decomposition is more concentrated for the historical VaR. Indeed, the exposure on 
Apple represents 96.68% whereas it was previously equal to 75.14%. The problem is that 
the estimator of the risk contribution only uses two observations, implying that its variance 
is very high. 
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TABLE 2.19: Risk decomposition of the 99% historical value-at-risk 
Asset Wi MRi RC; RC* 
Apple 56.47% 77.77 43.92 92.68% 
Coca-Cola 43.53% 7.97 347 7.32% 


We can consider three techniques to improve the efficiency of the estimator RC; = 
Lopai _,)). The first approach is to use a regularization method (Scaillet, 2004). The 
i Ry (ns(1 a)) 


idea is to estimate the value-at-risk by weighting the order statistics: 
VaRa (w;h) = — 5 Wa (8; ns) Hs:ns) 
ns 


= — 5 Wa (s; ns) Ho=1(s) 
s=1 


where wea (s; ng) is a weight function dependent on the confidence level a. The expression 
of the risk contribution then becomes: 


ns 
= — 5 Wa (s; ns) L; 99-15) 
s=1 


Of course, this naive method can be improved by using more sophisticated approaches such 
as importance sampling (Glasserman, 2005). 


In the second approach, asset returns are assumed to be elliptically distributed. In this 
case, Carroll et al. (2001) show that?®: 


RC; = E[Li] + 


(VaRa (L) — E [L]) (2.21) 


Estimating the risk contributions with historical scenarios is then straightforward. It suffices 
to apply Formula (2.21) by replacing the statistical moments by their sample statistics: 


= ”s (L. -— L) (Lis -— Li 
RC; = L; | De ( ) ( ) 
Di (Ls = L’ 
where L; = nz! Lis and L = nz! 1 Ls. Equation (2.21) can be viewed as the 
S an , Ss 


estimation of the conditional expectation E a | L = VaR, (L)] in a linear regression frame- 
work: 


(VaRa (L) — L) 


Li = pL+e€i 


96We verify that the sum of the risk contributions is equal to the value-at-risk: 


2 = J EIL] + (VaRa (2) -E| Dy i 


= E[L] + (VaRa (L) — E [L]) 
= VaRa (L) 
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Because the least squares estimator is Ê = cov (L, L;) /o? (L), we deduce that: 


D[L;|L = VaRa(L)]) = BVaRa 


L) + (EIL) — ÂE [L]) 
= A [Li] + B (VaRa (L) -E LJ) 


Epperlein and Smillie (2006) extend Formula (2.21) in the case of non-elliptical distri- 


butions. If we consider the generalized conditional expectation E | L;| L = x] = f (x) where 


the function f is unknown, the estimator is given by the kernel regression’”: 


_ Lee K (Ls — 2) Lis 
~ EAK) 


where K (u) is the kernel function. We deduce that: 


f (2) 


A 


RC; = f (VaR (L)) 


Epperlein and Smillie (2006) note however that this risk decomposition does not satisfy the 
Euler allocation principle. This is why they propose the following correction: 


© VaRa (LZ) 2 
RE, = sore (VaRa (D) 


doen K (Ls — VaRo (L)) Lis 


pean et K (Ls — VaRa (L)) Li,s 
Si K (Ls — VaRa (L)) Li,s 


= VaR. (L) DA K (Ls — VaR, (L)) Ls 


= VaRa(L) 


In Table 2.20, we have reported the risk contributions of the 99% value-at-risk for Apple 
and Coca-Cola stocks. The case G corresponds to the Gaussian value-at-risk whereas all the 
other cases correspond to the historical value-at-risk. For the case R1, the regularization 
weights are w99% (2; 250) = wg9% (3; 250) = 4 and wyg% (s; 250) = 0 when s 4 2 or s £3. 
It corresponds to the classical interpolation method between the second and third order 
statistics. For the case R2, we have wggy (s; 250) = + when s < 4 and wg% (s; 250) = 0 
when s > 4. The value-at-risk is therefore estimated by averaging the first four order 
statistics. The cases E and K correspond to the methods based on the elliptical and kernel 
approaches. For these two cases, we obtain a risk decomposition, which is closer to this 
obtained with the Gaussian method. This is quite logical as the Gaussian distribution is a 
special case of elliptical distributions and the kernel function is also Gaussian. 


TABLE 2.20: Risk contributions calculated with regularization techniques 


Asset G R1 R2 E K 

Apple 30.97 43.92 52.68 35.35 39.21 
Coca-Cola 10.25 3.47 2.29 12.03 8.17 
= R(w) 41.21 47.39 54.96 47.39 47.39 


97 f (x) is called the Nadaraya-Watson estimator (see Section 10.1.4.2 on page 641). 
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Example 20 Let L = Lı + Lə be the portfolio loss where L; (i = 1,2) is defined as follows: 
Li = wi (ui + oiT;) 


and T; has a Student’s t distribution with the number of degrees of freedom vi. The depen- 
dence function between the losses (Lı, La) is given by the Clayton copula: 
—1/0 
C (u1, u2) = (u;,? +uz? — 1) / 
For the numerical illustration, we consider the following values: wı = 100, uy = 10%, 


cı = 20%, vı = 6, wz = 200, u2 = 10%, o2 = 25%, vo = 4 and 0 = 2. The confidence level 
a of the value-at-risk is set to 90%. 


— Weighting 
n y = = Gaussian 
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FIGURE 2.19: Density function of the different risk contribution estimators 
In Figure 2.19, we compare the different statistical estimators of the risk contribution 


RC; when we use ng = 5000 simulations. Concerning the regularization method, we con- 
sider the following weight function applied to the order statistics of losses?®: 


aT RE. TEZON) 


2hng +1 ng 


It corresponds to a uniform kernel on the range [da (ns) — hns, da (ns) + hns]. In the first 
panel, we report the probability density function of RC, when h is equal to 0% and 2.5%. 
The case h = 0% is the estimator based on only one observation. We verify that the variance 


°8This is equivalent to use this weight function applied to the order statistics of P&Ls: 


1 = 
Wa (s; ns) = afl ga (ns)| <h} 
2hng +1 ns 
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of this estimator is larger for h = 0% than for h = 2.5%. However, we notice that this last 
estimator is a little biased, because we estimate the quantile 90% by averaging the order 
statistics corresponding to the range [87.5%, 92.5%]. In the second panel, we compare the 
weighting method with the elliptical and kernel approaches. These two estimators have a 
smaller variance, but a larger bias because they assume that the loss distribution is elliptical 
or may be estimated using a Gaussian kernel. Finally, the third panel shows the probability 
density function of RC, estimated with the Gaussian value-at-risk. 


The case of expected shortfall On page 70, we have shown that the expected shortfall 
is estimated as follows: 


BS. (1) = 5 a SMe > VaRa (L)}- Ls 
ES, (L) = -———~ : SoM < — VaRa (L)}- Is 


da (ns) 


It corresponds to the average of the ieee — or equal than the value-at-risk. It follows 
that: 


qa(ng) 


ESa (L) = >, Mga) 


a s) 


qa p 


Tly,—1(.) 


1 qa(ns) n 


aL a 3 IL, 9=1(5) 


We deduce that: 


1 qa (ns) 


= a > Lingo) 


da (ng si 


In the Apple/Coca-Cola example, we recall that the 99% daily value-at-risk is equal to 
$47.39. The corresponding expected shortfall is then the average of the two largest losses: 


84.34 + 51.46 
ES,, (w; one day) = a = = $67.90 
For the risk contribution, we obtain?’ 
: Al. 
RC, = ee = $64.54 


°° Because we have: 


Il(1.250) = —87.39 + 3.05 = —84.34 
and: 
Tl(2,250) = —41.69 — 9.77 = —51.46 
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ane 3.05 + 9.77 
Rly = 


The corresponding risk decomposition is given in Tables 2.21 and 2.22 for a = 99% and 
a = 97.5%. With the new rules of Basel III, the capital is higher for this example. 


= $3.36 


TABLE 2.21: Risk decomposition of the 99% historical expected shortfall 
Asset wi MR; RC; RC 
Apple 56.47% 114.29 64.54 95.05% 
_ Coca-Cola 43.53% _ 7.72 3.36 4.95% i 
R (w) 67.90 


TABLE 2.22: Risk decomposition of the 97.5% historical expected shortfall 


Asset Wi MRi RC; RC* 
Apple 56.47% 78.48 4432 91.31% 
Coca-Cola 43.53% 9.69 4.22 869% 
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FIGURE 2.20: Probability density function of the RC, estimator for the 99% VaR and 
97.5% ES 


In Figure 2.20, we report the probability density function of the RC, estimator in the 
case of Example 20. We consider the 99% value-at-risk and the 97.5% expected shortfall with 
ng = 5000 simulated scenarios. For the VaR risk measure, the risk contribution is estimated 
using respectively only one single observation and a weighting function corresponding to a 


116 Handbook of Financial Risk Management 


uniform window!°°. We notice that the estimator has a smaller variance with the expected 
shortfall risk measure. Of course, we can always reduce the variance of ES risk contributions 
by using the previous smoothing techniques (Scaillet, 2004), but this is less of an issue than 
for the value-at-risk measure. 


2.4 Exercises 


2.4.1 Calculating regulatory capital with the Basel I standardized mea- 
surement method 
1. We consider an interest rate portfolio with the following exposures: a long position 
of $100 mn on four-month instruments, a short position of $50 mn on five-month 
instruments, a long position of $10 mn on fifteen-year instruments and a short position 
of $50 mn on twelve-year instruments. 


(a) Using BCBS (1996a), explain the maturity approach for computing the capital 
requirement due to the interest rate risk. 


(b) By assuming that the instruments correspond to bonds with coupons larger than 
3%, calculate the capital requirement of the trading portfolio. 


2. We consider the following portfolio of stocks: 


Stock 3M Exxon IBM Pfizer AT&T Cisco Oracle 
L; 100 100 10 50 60 90 
Si 50 80 


where £; and S; indicate the long and short exposures on stock i expressed in $ mn. 


(a) Calculate the capital charge for the specific risk. 
(b) Calculate the capital charge for the general market risk. 


(c) How can the investor hedge the market risk of his portfolio by using S&P 500 
futures contracts? What is the corresponding capital charge? Verify that the 
investor minimizes the total capital charge in this case. 


3. We consider a net exposure Nw on an equity portfolio w. We note o (w) the annualized 
volatility of the portfolio return. 


(a) Calculate the required capital under the standardized measurement method. 


(b) Calculate the required capital under the internal model method if we assume 
that the bank uses a Gaussian value-at-risk!?!. 


(c) Deduce an upper bound o (w) < o* under which the required capital under 
SMM is higher than the required capital under IMA. 


(d) Comment on these results. 


100We set h = 0.5% meaning that the risk contribution is estimated with 51 observations for the 99% 
value-at-risk. 
101 We consider the Basel II capital requirement. rules. 
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4. We consider the portfolio with the following long and short positions expressed in $ 


mn: 
Asset EUR JPY CAD Gold Sugar Corn Cocoa 

L; 100 50 50 50 60 90 

Si 100 100 50 80 110 


(a) How do you explain that some assets present both long and short positions? 


(b) Calculate the required capital under the simplified approach. 


5. We consider the following positions (in $) of the commodity i: 


_Time band 0-1M 1M-3M 6M-1Y 1Y—2Y 2Y—3Y 3Y+_ 
Li (t) 500 0 1800 300 0 0 
S; (t) 300 900 100 600 100 200 


(a) Using BCBS (1996a), explain the maturity ladder approach for commodities. 


(b) Compute the capital requirement. 


2.4.2 Covariance matrix 


We consider a universe of there stocks A, B and C. 


1. The covariance matrix of stock returns is: 


4% 
N= [ 3% 5% 
2% -1% 6% 


(a) Calculate the volatility of stock returns. 


(b) Deduce the correlation matrix. 


2. We assume that the volatilities are 10%, 20% and 30%. whereas the correlation matrix 


is equal to: 
100% 
p= {| 50% 100% 
25% 0% 100% 
(a) Write the covariance matrix. 
(b) Calculate the volatility of the portfolio (50%, 50%, 0). 


) 
) 

(c) Calculate the volatility of the portfolio (60%, —40%, 0). Comment on this result. 
) 


(d) We assume that the portfolio is long $150 on stock A, long $500 on stock B and 
short $200 on stock C. Find the volatility of this long/short portfolio. 


3. We consider that the vector of stock returns follows a one-factor model: 
R=PF+e 
We assume that F and € are independent. We note o% the variance of F and D = 
diag (a7, Ga 3) the covariance matrix of idiosyncratic risks e+. We use the following 


numerical values: or = 50%, 6, = 0.9, B2 = 1.3, B3 = 0.1, 6 = 5%, Go = 5% and 
õ = 15%. 
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(a) Calculate the volatility of stock returns. 
(b) Calculate the correlation between stock returns. 


4. Let X and Y be two independent random vectors. We note u (X) and u (Y) the vector 
of means and © (X) and È (Y) the covariance matrices. We define the random vector 
Z = (Z1, Z2, Z3) where Z; is equal to the product X;Y;. 


(a) Calculate u (Z) and cov (Z). 


(b) We consider that u (X) is equal to zero and © (X) corresponds to the covariance 
matrix of Question 2. We assume that Yı, Yo and Y3 are three independent 
uniform random variables U/jo,1}. Calculate the 99% Gaussian value-at-risk of the 
portfolio corresponding to Question 2(d) when Z is the random vector of asset 
returns. Compare this value with the Monte Carlo VaR. 


2.4.3 Risk measure 
1. We denote F the cumulative distribution function of the loss L. 


(a) Give the mathematical definition of the value-at-risk and expected shortfall risk 
measures. 


(b) Show that: 


ES, (L) : F~ (t) dt 


“Iza 
(c) We assume that L follows a Pareto distribution P (0, x—) defined by: 


—0 
Pr{L<a}=1- (=) 

z 
where x > x_ and 0 > 1. Calculate the moments of order one and two. Interpret 
the parameters x— and 0. Calculate ES, (L) and show that: 

ESa (L) > VaRa (L) 


(d) Calculate the expected shortfall when L is a Gaussian random variable M (p, o°). 
Show that: 


Deduce that: 
ES, (L) > VaRa (L) when a > 1 


(e) Comment on these results in a risk management perspective. 


2. Let R (L) be a risk measure of the loss L. 


(a) Is R(L) = E[L] a coherent risk measure? 
(b) Same question if R (L) = E [L] + o (L). 


3. We assume that the probability distribution F of the loss L is defined by: 
20% if 4=0 
se il ae { 10% if & € {1,2,3,4,5,6,7,8} 
(a) Calculate ES, for a = 50%, a = 75% and a = 90%. 


(b) Let us consider two losses Lı and Lə with the same distribution F. Build a joint 
distribution of (Lı, L2) which does not satisfy the subadditivity property when 
the risk measure is the value-at-risk. 


Market Risk 119 


2.4.4 Value-at-risk of a long/short portfolio 


We consider a long/short portfolio composed of a long position on asset A and a short 
position on asset B. The long exposure is equal to $2 mn whereas the short exposure is 
equal to $1 mn. Using the historical prices of the last 250 trading days of assets A and B, 
we estimate that the asset volatilities 04 and og are both equal to 20% per year and that 
the correlation p4,p between asset returns is equal to 50%. In what follows, we ignore the 
mean effect. 


1. Calculate the Gaussian VaR of the long/short portfolio for a one-day holding period 
and a 99% confidence level. 


2. How do you calculate the historical VaR? Using the historical returns of the last 250 
trading days, the five worst scenarios of the 250 simulated daily P&L of the portfolio 
are —58 700, —56 850, —54 270, —52 170 and —49 231. Calculate the historical VaR for 
a one-day holding period and a 99% confidence level. 


3. We assume that the multiplication factor me is 3. Deduce the required capital if the 
bank uses an internal model based on the Gaussian value-at-risk. Same question when 
the bank uses the historical VaR. Compare these figures with those calculated with 
the standardized measurement method. 


4. Show that the Gaussian VaR is multiplied by a factor equal to ,/7/3 if the correlation 
p4,B is equal to —50%. How do you explain this result? 


5. The portfolio manager sells a call option on the stock A. The delta of the option is 
equal to 50%. What does the Gaussian value-at-risk of the long/short portfolio become 
if the nominal of the option is equal to $2 mn? Same question when the nominal of 
the option is equal to $4 mn. How do you explain this result? 


6. The portfolio manager replaces the short position on the stock B by selling a call 
option on the stock B. The delta of the option is equal to 50%. Show that the Gaussian 
value-at-risk is minimum when the nominal is equal to four times the correlation p4 pz. 
Deduce then an expression of the lowest Gaussian VaR. Comment on these results. 


2.4.5 Value-at-risk of an equity portfolio hedged with put options 
We consider two stocks A and B and an equity index J. We assume that the risk model 
corresponds to the CAPM and we have: 
Rj = Bj Rr + Ej 


where Rj and Rz are the returns of stock j and the index. We assume that Ry; and £; are 
independent. The covariance matrix of idiosyncratic risks is diagonal and we note a; the 
volatility of £j. 


1. The parameters are the following: o? (Rr) = 4%, Ba = 0.5, Bp = 1.5, 64 = 3% and 


52, = 7%. 


(a) Calculate the volatility of stocks A and B and the cross-correlation. 
(b) Find the correlation between the stocks and the index. 


(c) Deduce the covariance matrix. 


120 Handbook of Financial Risk Management 


2. The current price of stocks A and B is equal to $100 and $50 whereas the value of 
the index is equal to $50. The composition of the portfolio is 4 shares of A, 10 shares 
of B and 5 shares of I. 


(a) Determine the Gaussian value-at-risk for a confidence level of 99% and a 10-day 
holding period. 


(b 


wa 


Using the historical returns of the last 260 trading days, the five lowest simulated 
daily P&Ls of the portfolio are —62.39, —55.23, —52.06, —51.52 and —42.83. 
Calculate the historical VaR for a confidence level of 99% and a 10-day holding 
period. 


1102 


— 
(r) 
WN 


What is the regulatory capita if the bank uses an internal model based on 
the Gaussian value-at-risk? Same question when the bank uses the historical 
value-at-risk. Compare these figures with those calculated with the standardized 
measurement method. 


3. The portfolio manager would like to hedge the directional risk of the portfolio. For 
that, he purchases put options on the index J at a strike of $45 with a delta equal to 
—25%. Write the expression of the P&L using the delta approach. 


(a) How many options should the portfolio manager purchase for hedging 50% of the 
index exposure? Deduce the Gaussian value-at-risk of the corresponding portfo- 
lio? 

(b) The portfolio manager believes that the purchase of 96 put options minimizes 
the value-at-risk. What is the basis for his reasoning? Do you think that it is 
justified? Calculate then the Gaussian VaR of this new portfolio. 


2.4.6 Risk management of exotic options 


Let us consider a short position on an exotic option, whose its current value C; is equal to 
$6.78. We assume that the price S; of the underlying asset is $100 and the implied volatility 
X; is equal to 20%. 


1. At time t+1, the value of the underlying asset is $97 and the implied volatility remains 
constant. We find that the P&L of the trader between t and t + 1 is equal to $1.37. 
Can we explain the P&L by the sensitivities knowing that the estimates of delta A+, 
gamma I, and vega!?? v; are respectively equal to 49%, 2% and 40%? 


2. At time t+ 2, the price of the underlying asset is $97 while the implied volatility 
increases from 20% to 22%. The value of the option C;2 is now equal to $6.17. Can 
we explain the P&L by the sensitivities knowing that the estimates of delta Ay41, 
gamma [,,, and vega v;,, are respectively equal to 43%, 2% and 38%? 


3. At time t + 3, the price of the underlying asset is $95 and the value of the implied 
volatility is 19%. We find that the P&L of the trader between t+ 2 and t +3 is equal 
to $0.58. Can we explain the P&L by the sensitivities knowing that the estimates of 
delta Ayi2, gamma T4142 and vega U+ are respectively equal to 44%, 1.8% and 38%. 


4. What can we conclude in terms of model risk? 


102We assume that the multiplication factor me is equal to 3. 
103 Measured in volatility points. 
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2.4.7 P&L approximation with Greek sensitivities 


1. Let C; be the value of an option at time t. Define the delta, gamma, theta and vega 
coefficients of the option. 


2. We consider an European call option with strike K. Give the value of option in the 
case of the Black-Scholes model. Deduce then the Greek coefficients. 


3. We assume that the underlying asset is a non-dividend stock, the residual maturity of 
the call option is equal to one year, the current price of the stock is equal to $100 and 
the interest rate is equal to 5%. We also assume that the implied volatility is constant 
and equal to 20%. In the table below, we give the value of the call option Cp and the 
Greek coefficients Ag, Po and Oo for different values of K: 


K 80 95 100 105 120 

Co 24.589 13.346 10.451 8.021 3.247 
Ao 0.929 0.728 0.637 0.542 0.287 
To 0.007 0.017 0.019 0.020 0.017 
©) —4.776 —6.291 —6.414 —6.277 —4.681 


(a) Explain how these values have been calculated. Comment on these numerical 
results. 

(b) One day later, the value of the underlying asset is $102. Using the Black-Scholes 
formula, we obtain: 


K 80 95 100 105 120 
Cı 26.441 14.810 11.736 9.120 3.837 


Explain how the option premium C; is calculated. Deduce then the P&L of a 
long position on this option for each strike K. 


(c) For each strike price, calculate an approximation of the P&L by considering the 
sensitivities A, A—T, A — © and A -T — ©. Comment on these results. 


(d) Six months later, the value of the underlying asset is $148. Repeat Questions 
3(b) and 3(c) with these new parameters. Comment on these results. 


2.4.8 Calculating the non-linear quadratic value-at-risk 


1. Let X ~ N (0,1). Show that the even moments of X are given by the following 
relationship: 


3 [X7] = (2n — 1) E [X?"?] 
with n € N. Calculate the odd moments of X. 


2. We consider a long position on a call option. The current price S; of the underlying 
asset is equal to $100, whereas the delta and the gamma of the option are respectively 
equal to 50% and 2%. We assume that the annual return of the asset follows a Gaussian 
distribution with an annual volatility equal to 32.25%. 


(a) Calculate the daily Gaussian value-at-risk using the delta approximation with a 
99% confidence level. 


(b) Calculate the daily Gaussian value-at-risk by considering the delta-gamma ap- 
proximation. 


(c) Deduce the daily Cornish-Fisher value-at-risk. 
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3. Let X ~ N (u,I) and Y = X' AX with A asymmetric square matrix. 


(a) We recall that: 


[Y] y' Au + tr (A) 
u [Y?] = E*[¥]+4y' Au + 2tr (A?) 


II 


Deduce the moments of Y = X' AX when X ~N (p, X). 
(b) We suppose that u = 0. We recall that: 
LS 
yak aa = 


tr (A))* + 6 tr (A) tr (A?) + 8 tr (4°) 
tr (A))* + 32tr (A) tr (A?) + 12 (tr (42)? + 
12 (tr (A))* tr (A?) + 48 tr (A$) 


Compute the moments, the skewness and the excess kurtosis of Y = XT AX 
when X ~ N (0,5). 


4. We consider a portfolio w = (w1,..., Wn) of options. We assume that the vector of 
daily asset returns is distributed according to the Gaussian distribution M (0, £). We 
note A and T the vector of deltas and the matrix of gammas. 


(a) Calculate the daily Gaussian value-at-risk using the delta approximation. Define 
the analytical expression of the risk contributions. 


(b) Calculate the daily Gaussian value-at-risk by considering the delta-gamma, ap- 
proximation. 


(c) Calculate the daily Cornish-Fisher value-at-risk when assuming that the portfolio 
is delta neutral. 


(d) Calculate the daily Cornish-Fisher value-at-risk in the general case by only con- 
sidering the skewness. 


5. We consider a portfolio composed of 50 options in a first asset, 20 options in a second 
asset and 20 options in a third asset. We assume that the gamma matrix is: 


4.0% 
T=] 1.0% 1.0% 
0.0% —0.5% 1.0% 


The actual price of the assets is normalized and is equal to 100. The daily volatility 
levels of the assets are respectively equal to 1%, 1.5% and 2% whereas the correlation 
matrix of asset returns is: 


100% 
p= { 50% 100% 
25% 15% 100% 


(a) Compare the different methods to compute the daily value-at-risk with a 99% 
confidence level if the portfolio is delta neutral. 


(b) Same question if we now consider that the deltas are equal to 50%, 40% and 
60%. Compute the risk decomposition in the case of the delta and delta-gamma 
approximations. What do you notice? 
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2.4.9 Risk decomposition of the expected shortfall 


We consider a portfolio composed of n assets. We assume that asset returns R = 
(Ri,..., Rn) are normally distributed: R ~ N (u, £). We note L (w) the loss of the portfolio. 


1. Find the distribution of L (w). 
2. Define the expected shortfall ES, (w). Calculate its expression in the present case. 
3. Calculate the risk contribution RC; of asset i. Deduce that the expected shortfall 


verifies the Euler allocation principle. 


4. Give the expression of RC; in terms of conditional loss. Retrieve the formula of RC; 
found in Question 3. What is the interest of the conditional representation? 


2.4.10 Expected shortfall of an equity portfolio 


We consider an investment universe, which is composed of two stocks A and B. The 
current price of the two stocks is respectively equal to $100 and $200, their volatilities are 
equal to 25% and 20% whereas the cross-correlation is equal to —20%. The portfolio is long 
on 4 stocks A and 3 stocks B. 


1. Calculate the Gaussian expected shortfall at the 97.5% confidence level for a ten-day 
time horizon. 


2. The eight worst scenarios of daily stock returns among the last 250 historical scenarios 
are the following: 
s 1 2 3 4 5 6 7 8 
Ra 3% —4% -3% -5% -6% +3% +1% -1% 
Rp —4% +1% -2% -1% +2% -7% -3% -2% 


Calculate then the historical expected shortfall at the 97.5% confidence level for a 
ten-day time horizon. 


2.4.11 Risk measure of a long/short portfolio 


We consider an investment universe, which is composed of two stocks A and B. The 
current prices of the two stocks are respectively equal to $50 and $20. Their volatilities are 
equal to 25% and 20% whereas the cross-correlation is equal to +12.5%. The portfolio is 
long on 2 stocks A and short on 5 stocks B. 


1. Gaussian risk measure 


(a) Calculate the Gaussian value-at-risk at the 99% confidence level for a ten-day 
time horizon. 

(b) Calculate the Gaussian expected shortfall at the 97.5% confidence level for a 
ten-day time horizon. 


2. Historical risk measure 
The ten worst scenarios of daily stock returns (expressed in %) among the last 250 
historical scenarios are the following: 


s 1 2 3 4 5 6 7 8 9 10 
Ra 0.6 3.7 5.8 4.2 3.7 0.0 5.7 4.3 1.7 4.1 
Re 5.7 2.3 —0.7 0.6 0.9 4.5 —14 0.0 2.3 —0.2 
D 6.3 6.0 5.1 4.8 4.6 4.5 4.3 4.3 4.0 3.9 


where D = R4 — Rp is the difference of the returns. 
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(a) Calculate the historical value-at-risk at the 99% confidence level for a ten-day 
time horizon. 

(b) Calculate the historical expected shortfall at the 97.5% confidence level for a 
ten-day time horizon. 


(c) Give an approximation of the capital charge under Basel II, Basel 2.5 and Basel 
III standards by considering the historical risk measure!%*. 


2.4.12 Kernel estimation of the expected shortfall 


1. We consider a random variable X. We note K (u) the kernel function associated to 
the sample {z1,...,2,}. Show that: 


1, poe 
[X-1{X <a}] 2 BW dut 


je fo 
= 5 I huK (u) du 
nN * = 
i=l ro 
2. Find the expression of the first term by considering the integrated kernel function 
T (u). 
3. Show that the second term tends to zero when h —> 0. 


4. Deduce an approximation of the expected shortfall ES, (w; h). 


104We assume that the multiplicative factor is equal to 3 (Basel II), and the ‘stressed’ risk measure is 2 
times the ‘normal’ risk measure (Basel 2.5). 


Chapter 3 


Credit Risk 


In this chapter, we give an overview of the credit market. It concerns loans and bonds, 
but also credit derivatives whose development was impressive during the 2000s. A thor- 
ough knowledge of the products is necessary to understand the regulatory framework for 
computing the capital requirements for credit risk. In this second section, we will there- 
fore compare Basel I, Basel II and Basel III approaches. The case of counterparty credit 
risk will be treated in the next chapter, which focuses on collateral risk. Finally, the last 
section is dedicated to the modeling of credit risk. We will develop the statistical methods 
for modeling and estimating the main parameters (probability of default, loss given default 
and default correlations) and we will show the tools of credit risk management. Concerning 
credit scoring models, we refer to Chapter 15, which is fully dedicated on this topic. 


3.1 The market of credit risk 
3.1.1 The loan market 


In this section, we present the traditional debt market of loans based on banking inter- 
mediation, as opposed to the financial market of debt securities (money market instruments, 
bonds and notes). We generally distinguish this credit market along two main lines: coun- 
terparties and products. 


Counterparties are divided into 4 main categories: sovereign, financial, corporate and 
retail. Banking groups have adopted this customer-oriented approach by differentiating 
retail banking and corporate and investment banking (CIB) businesses. Retail banking 
refers to individuals. It may also include micro-sized firms and small and medium-sized 
enterprises (SME). CIBs concern middle market firms, corporates, financial institutions 
and public entities. In retail banking, the bank pursues a client segmentation, meaning 
that all the clients that belongs to the same segment have the same conditions in terms 
of financing and financial investments. This also implies that the pricing of the loan is the 
same for two individuals of the same segment. The issue for the bank is then to propose or 
not a loan offer to his client. For that, the bank uses statistical decision-making methods, 
which are called credit scoring models. Contrary to this binary approach (yes or no), CIBs 
have a personalized approach to their clients. They estimate their probability of default and 
changes the pricing condition of the loan on the basis of the results. A client with a low 
default probability will have a lower rate or credit spread than a client with a higher default 
probability for the same loan. 


The household credit market is organized as follows: mortgage and housing debt, con- 
sumer credit and student loans. A mortgage is a debt instrument secured by the collateral 
of a real estate property. In the case where the borrower defaults on the loan, the lender 
can take possession and sell the secured property. For instance, the home buyer pledges 
his house to the bank in a residential mortgage. This type of credit is very frequent in 
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English-speaking countries, notably England and the United States. In continental Europe, 
home loans are generally not collateralized for a primary home. This is not always the case 
for buy-to-let investments and second-home loans. Consumer credit is used for equipment 
financing or leasing. We usually make the distinction between auto loans, credit cards, re- 
volving credit and other loans (personal loans and sales financing). Auto loans are personal 
loans to purchase a car. Credit cards and revolving credit are two forms of personal lines 
of credit. Revolving credit facilities for individuals are very popular in the US. It can be 
secured, as in the case of a home equity line of credit (HELOC). Student loans are used 
to finance educational expenses, for instance post-graduate studies at the university. The 
corporate credit market is organized differently, because large corporates have access to the 
financial market for long-term financing. This explains that revolving credit facilities are 
essential to provide liquidity for the firm’s day-to-day operations. The average maturity is 
then lower for corporates than for individuals. 


Credit statistics for the private non-financial sector (households and non-financial cor- 
porations) are reported in Figures 3.1 and 3.2. These statistics include loan instruments, 
but also debt securities. In the case of the United States!, we notice that the credit amount 
for households? is close to the figure for non-financial business. We also observe the signifi- 
cant share of consumer credit and the strong growth of student loans. Figure 3.2 illustrates 
the evolution of debt outstanding? for different countries: China, United Kingdom, Japan, 
United States and the Euro area. In China, the annual growth rate is larger than 20% these 
last five years. Even if credit for households develops much faster than credit for corpora- 
tions, it only represents 24% of the total credit market of the private non-financial sector. 
The Chinese market contrasts with developed markets where the share of household credit 
is larger* and growth rates are almost flat since the 2008 financial crisis. The Japanese 
case is also very specific, because this country experienced a strong financial crisis after 
the bursting of a bubble in the 1990s. At that time, the Japanese market was the world’s 
leading market followed by the United States. 


3.1.2 The bond market 


Contrary to loan instruments, bonds are debt securities that are traded in a financial 
market. The primary market concerns the issuance of bonds whereas bond trading is or- 
ganized through the secondary market. The bond issuance market is dominated by two 
sectors: central and local governments (including public entities) and corporates. This is 
the principal financing source for government projects and public budget deficits. Large 
corporates also use extensively the bond market for investments, business expansions and 
external growth. The distinction government bonds/corporate bonds was crucial before the 
2008 Global Financial Crisis. Indeed, it was traditionally believed that government bonds 
(in developed countries) were not risky because the probability of default was very low. In 
this case, the main risk was the interest rate risk, which is a market risk. Conversely, corpo- 
rate bonds were supposed to be risky because the probability of default was higher. Besides 
the interest rate risk, it was important to take into account the credit risk. Bonds issued 
from the financial and banking sector were considered as low risk investments. Since 2008, 


lData are from the statistical release Z.1 “Financial Accounts of the United States”. They are available 
from the website of the Federal Reserve System: https://www.federalreserve.gov/releases/z1 or more 
easily with the database of the Federal Reserve Bank of St. Louis: https://fred.stlouisfed.org. 

?Data for households include non-profit institutions serving households (NPISH). 

3Data are collected by the Bank for International Settlements and are available in the website of the 
BIS: https://www.bis.org/statistics. The series are adjusted for breaks (Dembiermont et al., 2013) and 
we use the average exchange rate from 2000 to 2014 in order to obtain credit amounts in USD. 

“This is especially true in the UK and the US. 
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FIGURE 3.1: Credit debt outstanding in the United States (in $ tn) 


Source: Board of Governors of the Federal Reserve System (2019). 
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FIGURE 3.2: Credit to the private non-financial sector (in $ tn) 


Source: Bank for International Settlements (2019) and author’s calculations. 


128 Handbook of Financial Risk Management 


TABLE 3.1: Debt securities by residence of issuer (in $ bn) 
Dec. 2004 Dec. 2007 Dec. 2010 Dec. 2017 


Gov. 682 841 1149 1264 
Canada Fi 283 450 384 655 
Corp 212 248 326 477 

g Total 1180 1544 1863 2400 
Gov 1236 1514 1838 2258 
France in 968 1619 1817 1618 
Corp 373 382 483 722 

g Total 2576 3515 4138 459 
Gov 1380 1717 2040 1939 
Gamay oP 2296 2766 2283 1550 
Corp 133 174 168 222 

i Total 3809 4657 4491 372 — 
Gov 1637 1928 2069 2292 
Italy Fin 772 1156 1403 834 
Corp 68 95 121 174 

g Total 2477 3178 359 3299 
Gov 6336 6315 10173 9477 
io Fin 2548 2775 3451 2475 
Corp 1012 762 980 742 

j Total 9896 9852 14604  126%4 
Gov 462 498 796 1186 
dpan Fin 434 1385 1442 785 
Corp 15 19 19 44 

g Total 910 1901 2256 2015 — 
Gov 798 1070 1674 2785 
UK Fin 1775 3127 3061 2689 
Corp. 452 506 473 533 

g Total 3027 4706 5210 601 
Gov. 6459 7487 12072 17592 
us Fin. 12706 17 604 15 666 15557 
Corp. 3004 3 348 3951 6 137 


Total 22371 28 695 31960 39 504 


Source: Bank for International Settlements (2019). 


this difference between non-risky and risky bonds has disappeared, meaning that all issuers 
are risky. The 2008 GFC had also another important consequence on the bond market. It is 
today less liquid even for sovereign bonds. Liquidity risk is then a concern when measuring 
and managing the risk of a bond portfolio. This point is developed in Chapter 6. 


3.1.2.1 Statistics of the bond market 


In Table 3.1, we indicate the outstanding amount of debt securities by residence of 
issuer’. The total is split into three sectors: general governments (Gov.), financial corpora- 
tions (Fin.) and non-financial corporations (Corp.). In most countries, debt securities issued 
by general governments largely dominate, except in the UK and US where debt securities 


5The data are available in the website of the BIS: https: //www.bis.org/statistics. 
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issued by financial corporations (banks and other financial institutions) are more impor- 
tant. The share of non-financial business varies considerably from one country to another. 
For instance, it represents less than 10% in Germany, Italy, Japan and Spain, whereas it is 
equal to 20% in Canada. The total amount of debt securities tends to rise, with the notable 
exception of Germany, Japan and Spain. 
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FIGURE 3.3: US bond market outstanding (in $ tn) 


Source: Securities Industry and Financial Markets Association (2019a). 


The analysis of the US market is particularly interesting and relevant. Using the data 
collected by the Securities Industry and Financial Markets Association® (SIFMA), we have 
reported in Figure 3.3 the evolution of outstanding amount for the following sectors: munic- 
ipal bonds, treasury bonds, mortgage-related bonds, corporate related debt, federal agency 
securities, money markets and asset-backed securities. We notice an important growth dur- 
ing the beginning of the 2000s (see also Figure 3.4), followed by a slowdown after 2008. 
However, the debt outstanding continues to grow because the average maturity of new is- 
suance increases. Another remarkable fact is the fall of the liquidity, which can be measured 
by the average daily volume (ADV). Figure 3.5 shows that the ADV of treasury bonds re- 
mains constant since 2000 whereas the outstanding amount has been multiplied by four 
during the same period. We also notice that the turnover of US bonds mainly concerns 
treasury and agency MBS bonds. The liquidity on the other sectors is very poor. For in- 
stance, according to SIFMA (2019a), the ADV of US corporate bonds is less than $30 bn 
in 2014, which is 22 times lower than the ADV for treasury bonds’. 


SData are available in the website of the SIFMA: https: //www.sifma.org/resources/archive/resear 
ch/. 
“However, the ratio between their outstanding amount is only 1.6. 
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FIGURE 3.4: US bond market issuance (in $ tn) 


Source: Securities Industry and Financial Markets Association (2019a). 
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3.1.2.2 Bond pricing 


We first explain how to price a bond by only considering the interest rate risk. Then, 
we introduce the default risk and define the concept of credit spread, which is key in credit 
risk modeling. 


a a 


Coupons C (tm) 


p N 
tı to tno n tne T time 


FIGURE 3.6: Cash flows of a bond with a fixed coupon rate 


Without default risk We consider that the bond pays coupons C (tm) with fixing dates 
tm and the notional N (or the par value) at the maturity date T. We have reported an 
example of a cash flows scheme in Figure 3.6. Knowing the yield curve®, the price of the 
bond at the inception date tọ is the sum of the present values of all the expected coupon 
payments and the par value: 


where B; (tm) is the discount factor at time t for the maturity date tm. When the valuation 
date is not the issuance date, the previous formula remains valid if we take into account the 
accrued interests. In this case, the buyer of the bond has the benefit of the next coupon. 
The price of the bond then satisfies: 


P, + ACi = X C(tm)+ Bi (tm) +N- Bi (T) (3.2) 


tm2t 


8A convenient way to define the yield curve is to use a parametric model for the zero-coupon rates Rs (T). 
The most famous model is the parsimonious functional form proposed by Nelson and Siegel (1987): 


7 1—exp(— (T -= ¢)/84)\ , 
R (T) 01 4 02 ( (T — t)/64 ) i 
63 (- = Ea ao exp ( (T t)/04)) (3.1) 


This is a model with four parameters: 6; is a parameter of level, 62 is a parameter of rotation, 03 controls 
the shape of the curve and 64 permits to localize the break of the curve. We also note that the short-term 
and long-term interest rates Rz (t) and Rz (oo) are respectively equal to 61 + 62 and 61. 
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Here, AC; is the accrued coupon: 
t-te 

365 
and t, is the last coupon payment date with c= {m : tm+1 > t,tm < t}. P, + AC; is called 
the ‘dirty price’ whereas P, refers to the ‘clean price’. The term structure of interest rates 
impacts the bond price. We generally distinguish three movements: 


AC; = C (te): 


1. The movement of level corresponds to a parallel shift of interest rates. 


2. A twist in the slope of the yield curve indicates how the spread between long and 
short interest rates moves. 


3. A change in the curvature of the yield curve affects the convexity of the term structure. 


All these movements are illustrated in Figure 3.7. 
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FIGURE 3.7: Movements of the yield curve 


The yield to maturity y of a bond is the constant discount rate which returns its market 
price: 
XO C (tm) e799 + NeT- = P, + AC, 


tmt 
We also define the sensitivity? S of the bond price as the derivative of the clean price P, 
with respect to the yield to maturity y: 
OP; 
oy 
= — OS (tm—t)C (tm) e Om 9 — (T — t) Ne“ FY 


S = 


°This sensitivity is also called the $-duration or DVO1. 
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It indicates how the P&L of a long position on the bond moves when the yield to maturity 
changes: 
lx S-Ay 


Because S < 0, the bond price is a decreasing function with respect to interest rates. This 
implies that an increase of interest rates reduces the value of the bond portfolio. 


Example 21 We assume that the term structure of interest rates is generated by the Nelson- 
Siegel model with 01 = 5%, 02 = —5%, 03 = 6% and 64 = 10. We consider a bond with a 
constant annual coupon of 5%. The nominal of the bond is $100. We would like to price the 
bond when the maturity T ranges from 1 to 5 years. 


TABLE 3.2: Price, yield to maturity and sensitivity of bonds 
R(T) B(T) _ R y S 

0.52% 99.48 104.45 0.52% —104.45 

0.99% 98.03 107.91 0.98% —210.86 

1.42% 95.83 110.50 1.39% —316.77 

1.80% 93.04 112.36 1.76% —420.32 

2.15% 89.82 113.63 2.08% —520.16 


aAa wuel 


TABLE 3.3: Impact of a parallel shift of the yield curve on the bond with five-year maturity 


AR ; x n i 
(in bps)! ČŽ AR; Â AR | SxAy 


—50 116.26 2.63 , 116.26 2.63 2.60 
—30 115.20 1.57 | 115.20 1.57 1.56 
—10 114.15 0.52 114.15 0.52 0.52 


I 
l 
l 
0 i 113.63 0.00 ı 113.63 0.00 0.00 
l 
l 
l 
l 


10 113.11 —0.52 ' 113.11 —0.52 ' —0.52 
30 112.08 —1.55 , 112.08 —1.55 , —1.56 
50 ! 111.06 —2.57 | 111.06 —2.57 ! —2.60 


Using the Nelson-Siegel yield curve, we report in Table 3.2 the price of the bond with 
maturity T (expressed in years) with a 5% annual coupon. For instance, the price of the 
four-year bond is calculated in the following way: 


5 5 5 105 


P, = F + + 
"~~ (1+0.52%) © (1+0.99%)? © (1+1.42%)* ° (1 +1.80%) 


z = $112.36 


We also indicate the yield to maturity y (in %) and the corresponding sensitivity S. Let P, 
(resp. P;) be the bond price by taking into account a parallel shift AR (in bps) directly on 
the zero-coupon rates (resp. on the yield to maturity). The results are given in Table 3.3 in 
the case of the bond with a five-year maturity. We verify that the computation based on 


10We have: 
P, = C (tm) e7 (mH (Re (tm FAR) 4. NeW (T-t) (R:(T)+AR) 
tm>t 
and: 
P= YO O (tm) enKm- OFF) 4 Ne“ P-OH+AA) 


tm 2t 
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FIGURE 3.8: Cash flows of a bond with default risk 


the sensitivity provides a good approximation. This method has been already used in the 
previous chapter on page 77 to calculate the value-at-risk of bonds. 


With default risk In the previous paragraph, we assume that there is no default risk. 
However, if the issuer defaults at time 7 before the bond maturity T, some coupons and 
the notional are not paid. In this case, the buyer of the bond recovers part of the notional 
after the default time. An illustration is given in Figure 3.8. In terms of cash flows, we have 
therefore: 


e the coupons C (tm) if the bond issuer does not default before the coupon date tm: 


XO C (tm) L{T > tm} 


tm>t 


e the notional if the bond issuer does not default before the maturity date: 


N-1{r>T} 


e the recovery part if the bond issuer defaults before the maturity date: 
R-N-1{7r<T} 
where R is the corresponding recovery rate. 


If we assume that the recovery part is exactly paid at the default time 7, we deduce that 
the stochastic discounted value of the cash flow leg is: 
tm d 
v= > OG jae: "sds. eS ty} + 


tm>t 


T T. 
Weed, TOES ie ea NE e h 78 Ae ett 
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The price of the bond is the expected value of the stochastic discounted value!!: P, + AC; = 
2 [SV; | Fi]. If we assume that (H1) the default time and the interest rates are independent 
and (H2) the recovery rate is known and not stochastic, we obtain the following closed-form 
formula: 


P, + AC, = > C (tm) Be (tm) St (tm) + NB; (T) S, (T) + 
mat " 
on B; (u) fı (u) du (3.3) 


where S; (u) is the survival function at time u and f(u) the associated density function’? 


Remark 20 If the issuer is not risky, we have S, (u) = 1 and fi (u) = 0. In this case, 
Equation (3.3) reduces to Equation (3.2). 


Remark 21 Jf we consider an exponential default time with parameter A — T ~ E (A), we 
have S; (u) = e*-®), fi (u) = Aea- and: 


PFA = D, C (tm) Bi (tm) eG) + NB, T] + 


tm>t 


T 
RN | B; (u) e 84) du 
t 


If we assume a flat yield curve — R, (u) =r, we obtain: 


P,+AC, = 5 C (tm) eT TH mt) 4 peT CHAT) 4 
fmt 
j —(r+A)(T-t) 
ARN i g ) 
r+À 


Example 22 We consider a bond with ten-year maturity. The notional is $100 whereas the 
annual coupon rate is equal to 4.5%. 


If we consider that r = 0, the price of the non-risky bond is $145. With r = 5%, the 
price becomes $95.19. Let us now take into account the default risk. We assume that the 
recovery rate R is 40%. If A = 2% (resp. 10%), the price of the risky bond is $86.65 (resp. 
$64.63). If the yield curve is not flat, we must use the general formula (3.3) to compute 
the price of the bond. In this case, the integral is evaluated with a numerical integration 
procedure, typically a Gauss-Legendre quadrature!3. For instance, if we consider the yield 
curve defined in Example 21, the bond price is equal to $110.13 if there is no default risk, 
$99.91 if A = 2% and $73.34 if A = 10%. 


The yield to maturity of the defaultable bond is computed exactly in the same way as 
without default risk. The credit spread s is then defined as the difference of the yield to 
maturity with default risk y and the yield to maturity without default risk y*: 


s=y-y* (3.4) 


11I¢ is also called the present value. 
12We have: 
S: (u) =E[l{r >u|7>t}]}=Pr{r>u|7 >t} 
The density function is then given by ft (u) = —OuS¢ (u). 
13See Appendix A.1.2.3 on page 1037 for a primer on numerical integration. 
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This spread is a credit risk measure and is an increasing function of the default risk. Re- 
consider the simple model with a flat yield curve and an exponential default time. If the 
recovery rate R is equal to zero, we deduce that the yield to maturity of the defaultable 
bond is y = r + À. It follows that the credit spread is equal to the parameter A of the expo- 
nential distribution. Moreover, if A is relatively small (less than 20%), the annual probability 
of default is: 

PD =S;(¢+1)=1-e%)d 


In this case, the credit spread is approximately equal to the annual default probability 
(s x PD). 

If we reuse our previous example with the yield curve specified in Example 21, we obtain 
the results reported in Table 3.4. For instance, the yield to maturity of the bond is equal 
to 3.24% without default risk. If A and R are set to 200 bps and 0%, the yield to maturity 
becomes 5.22% which implies a credit spread of 198.1 bps. If the recovery rate is higher, 
the credit spread decreases. Indeed, with equal to 200 bps, the credit spread is equal to 
117.1 bps if R = 40% and only 41.7 bps if R = 80%. 


TABLE 3.4: Computation of the credit spread $ 


R À PD P, y s 
(in %) (in bps) (in bps) (in $) (in%) (in bps) 
0 0.0 110.1 3.24 0.0 
0 10 10.0 109.2 3.34 9.9 


200 198.0 93.5 5.22 198.1 
1000 951.6 50.4 13.13 988.9 


0 0.0 110.1 3.24 — 0.0 

i 10 10.0 109.6 3.30 6.0 
200 198.0 99.9 4.41 117.1 

1000 951.6 73.3 8.23 498.8 
oO 0.0 0I 3.24 00. 
rt 10 10.0 109.9 3.26 2.2 


200 198.0 106.4 3.66 41.7 
1000 951.6 96.3 4.85 161.4 


Remark 22 In the case of loans, we do not calculate a capital requirement for market 
risk, only a capital requirement for credit risk. The reason is that there is no market price 
of the loan, because it cannot be traded in an exchange. For bonds, we calculate a capital 
requirement for both market and credit risks. In the case of the market risk, risk factors 
are the yield curve rates, but also the parameters associated to the credit risk, for instance 
the default probabilities and the recovery rate. In this context, market risk has a credit 
component. To illustrate this property, we consider the previous example and we assume 
that A, varies across time whereas the recovery rate R is equal to 40%. In Figure 3.9, we 
show the evolution of the process As for the neat 10 years (top panel) and the clean price’ 
P, (bottom/left panel). If we suppose now that the issuer defaults suddenly at time t = 6.25, 
we observe a jump in the clean price (bottom/right panel). It is obvious that the market risk 
takes into account the short-term evolution of the credit component (or the smooth part), but 
does not incorporate the jump risk (or the discontinuous part) and also the large uncertainty 
on the recovery price. This is why these risks are covered by credit risk capital requirements. 


14We assume that the yield curve remains constant. 
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FIGURE 3.9: Difference between market and credit risks for a bond 


3.1.3 Securitization and credit derivatives 


Since the 1990s, banks have developed credit transfer instruments in two directions: 
credit securitization and credit derivatives. The term securitization refers to the process of 
transforming illiquid and non-tradable assets into tradable securities. Credit derivatives are 
financial instruments whose payoff explicitly depends on credit events like the default of 
an issuer. These two topics are highly connected because credit securities can be used as 
underlying assets of credit derivatives. 


3.1.3.1 Credit securitization 


According to AFME (2019), outstanding amount of securitization is close to €9 tn. 
Figure 3.10 shows the evolution of issuance in Europe and US since 2000. We observe that 
the financial crisis had a negative impact on the growth of credit securitization, especially 
in Europe that represents less than 20% of this market. This market is therefore dominated 
by the US, followed by UK, France, Spain, the Netherlands and Germany. 

Credit securities are better known as asset-backed securities (ABS), even if this term is 
generally reserved to assets that are not mortgage, loans or corporate bonds. In its simplest 
form, an ABS is a bond whose coupons are derived from a collateral pool of assets. We 
generally make the following distinction with respect to the type of collateral assets: 


e Mortgage-backed securities (MBS) 


— Residential mortgage-backed securities (RMBS) 
— Commercial mortgage-backed securities (CMBS) 


e Collateralized debt obligations (CDO) 


— Collateralized loan obligations (CLO) 
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FIGURE 3.10: Securitization in Europe and US (in € tn) 


Source: Association for Financial Markets in Europe (2019). 


— Collateralized bond obligations (CBO) 
e Asset-backed securities (ABS) 


— Auto loans 
— Credit cards and revolving credit 


— Student loans 


MBS are securities that are backed by residential and commercial mortgage loans. The 
most basic structure is a pass-through security, where the coupons are the same for all 
the investors and are proportional to the revenue of the collateral pool. Such structure is 
shown in Figure 3.11. The originator (e.g. a bank) sells a pool of debt to a special purpose 
vehicle (SPV). The SPV is an ad-hoc legal entity’? whose sole function is to hold the loans 
as assets and issue the securities for investors. In the pass-through structure, the securities 
are all the same and the cash flows paid to investors are directly proportional to interests 
and principals of collateral assets. More complex structures are possible with several classes 
of bonds (see Figure 3.12). In this case, the cash flows differ from one type of securities 
to another one. The most famous example is the collateralized debt obligation, where the 
securities are divided into tranches. This category includes also collateralized mortgage 
obligations (CMO), which are both MBS and CDO. The two other categories of CDOs are 
CLOs, which are backed by corporate bank debt (e.g. SME loans) and CBOs, which are 
backed by bonds (e.g. high yield bonds). Finally, pure ABS principally concerns consumer 
credit such as auto loans, credit cards and student loans. 


15Tt may be a subsidiary of the originator. 
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FIGURE 3.11: Structure of pass-through securities 
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FIGURE 3.12: Structure of pay-through securities 


In Table 3.5, we report some statistics about US mortgage-backed securities. SIFMA 
(2019b) makes the distinction between agency MBS and non-agency MBS. After the Great 
Depression, the US government created three public entities to promote home ownership and 
provide insurance of mortgage loans: the Federal National Mortgage Association (FNMA or 
Fannie Mae), the Federal Home Loan Mortgage Corporation (FHLMC or Freddie Mac) and 
the Government National Mortgage Association (GNMA or Ginnie Mae). Agency MBS refer 
to securities guaranteed by these three public entities and represent the main part of the US 
MBS market. This is especially true since the 2008 financial crisis. Indeed, non-agency MBS 
represent 53.5% of the issuance in 2006 and only 3.5% in 2012. Because agency MBS are 
principally based on home mortgage loans, the RMBS market is ten times more larger than 
the CMBS market. CDO and ABS markets are smaller and represent together about $1.5 
tn (see Table 3.6). The CDO market strongly suffered from the subprime crisis'®. During 
the same period, the structure of the ABS market changed with an increasing proportion 
of ABS backed by auto loans and a fall of ABS backed by credit cards and student loans. 


Remark 23 Even if credit securities may be viewed as bonds, their pricing is not straight- 
forward. Indeed, the measure of the default probability and the recovery depends on the 


16For instance, the issuance of US CDO was less than $10 bn in 2010. 
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TABLE 3.5: US mortgage-backed securities 


Year Agency Non-agency Total 
MBS CMO CMBS RMBS (in $ bn) 
Issuance 
2002 57.5% 23.6% 2.2% 16.7% 2515 
2006 33.6% 11.0% 7.9% 47.5% 2691 
2008 84.2% 10.8% 1.2% 3.8% 1394 
2010 71.0% 24.5% 1.2% 3.3% 2013 
2012 80.1% 16.4% 2.2% 1.3% 2195 
2014 68.7% 19.2% 7.0% 5.1% 1 440 
2016 76.3% 15.7% 3.8% 4.2% 2044 
2018 69.2% 16.6% 4.7% 9.5% 1899 
Outstanding amount 
2002 59.7% 17.4% 56% 17.2% 5 289 
2006 45.7% 14.9% 83% 31.0% 8 390 
2008 52.4% 14.0% 88% 24.9% 9 467 
2010 59.2% 146% 81% 18.1% 9 258 
2012 64.0% 14.8% 7.2% 14.0% 8 838 
2014 68.0% 13.7% 7.1% 11.2% 8 842 
2016 72.4% 12.3% 5.9% 9.5% 9023 
2018 74.7% 11.3% 5.6% 8.4% 9732 


Source: Securities Industry and Financial Markets Association (2019b,c) and author’s 


calculations. 
TABLE 3.6: US asset-backed securities 

Auto CDO Credit Equip- Student Total 

Koar Loans & CLO Cards ement Cnet Loans (in $ bn) 
Issuance 
2002 34.9% 21.0% 25.2% 2.6% 6.8% 9.5% 269 
2006 13.5% 60.1% 9.3% 2.2% 4.6% 10.3% 658 
2008 16.5% 37.8% 25.9% 1.3% 5.4% 13.1% 215 
2010 46.9% 6.4% 5.2% 7.0% 22.3% 12.3% 126 
2012 33.9% 23.1% 125% 7.1% 13.7% 9.8% 259 
2014 25.2% 35.6% 13.1% 5.2% 17.0% 4.0% 393 
2016 28.3% 36.8% 8.3% 4.6% 16.9% 5.1% 325 
2018 20.8% 543% 61% 5.1% 10.1% 3.7% 517 
Outstanding amount 

2002 20.7% 28.6% 32.5% 4.1% 7.5% 6.6% 905 
2006 11.8% 49.3% 17.6% 3.1% 6.0% 12.1% 1657 
2008 7.7% 53.5% 17.8% 2.4% 6.2% 13.0% 1830 
2010 7.6% 52.4% 14.4% 24% 71% 16.1% 1508 
2012 11.0% 48.7% 10.0% 3.3% 8.7% 18.4% 1 280 
2014 13.2% 46.8% 10.1% 3.9% 9.8% 16.2% 1349 
2016 13.9% 48.0% 9.3% 3.7% 116% 13.5% 1397 
2018 13.3% 48.2% 7.4% 5.0% 16.0% 10.2% 1677 


Source: Securities Industry and Financial Markets Association (2019b,c) and author’s 


calculations. 
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FIGURE 3.13: Outstanding amount of credit default swaps (in $ tn) 


Source: Bank for International Settlements (2019). 


characteristics of the collateral assets (individual default probabilities and recovery rates), 
but also on the correlation between these risk factors. Measuring credit risk of such securities 
is then a challenge. Another issue concerns design and liquidity problems faced when pack- 
aging and investing in these assets!” (Duffie and Rahi, 1995; DeMarzo and Duffie, 1999). 
This explains that credit securities suffered a lot during the 2008 financial crisis, even if 
some of them were not linked to subprime mortgages. In fact, securitization markets pose 
a potential risk to financial stability (Segoviano et al., 2013). This is a topic we will return 
to in Chapter 8, which deals with systemic risk. 


3.1.3.2 Credit default swap 


A credit default swap (CDS) may be defined as an insurance derivative, whose goal is 
to transfer the credit risk from one party to another. In a standard contract, the protection 
buyer makes periodic payments (known as the premium leg) to the protection seller. In 
return, the protection seller pays a compensation (known as the default leg) to the protection 
buyer in the case of a credit event, which can be a bankruptcy, a failure to pay or a 
debt restructuring. In its most basic form, the credit event refers to an issuer (sovereign 
or corporate) and this corresponds to a single-name CDS. If the credit event relates to 
a universe of different entities, we speak about a multi-name CDS. In Figure 3.13, we 
report the evolution of outstanding amount of CDS since 2007. The growth of this market 
was very strong before 2008 with a peak close to $60 tn. The situation today is different, 
because the market of single-name CDS stabilized whereas the market of basket default 
swaps continues to fall significantly. Nevertheless, it remains an important OTC market 
with a total outstanding around $9 tn. 


17 The liquidity issue is treated in Chapter 6. 
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FIGURE 3.14: Cash flows of a single-name credit default swap 


In Figure 3.14, we report the mechanisms of a single-name CDS. The contract is defined 
by a reference entity (the name), a notional principal N, a maturity or tenor T, a payment 
frequency, a recovery rate R and a coupon ratel e. From the inception date t to the 
maturity date T or the default time 7, the protection buyer pays a fixed payment, which 
is equal to c- N - Atm at the fixing date tm with Atm = tm — tm-1. This means that the 
annual premium leg is equal to c- N. If there is no credit event, the protection buyer will 
also pay a total of e- N - (T — t). In case of credit event before the maturity, the protection 
seller will compensate the protection buyer and will pay (1 — R)- N. 


Example 23 We consider a credit default swap, whose notional principal is $10 mn, ma- 
turity is 5 years and payment frequency is quarterly. The credit event is the bankruptcy of 
the corporate entity A. We assume that the recovery rate is set to 40% and the coupon rate 
is equal to 2%. 


Because the payment frequency is quarterly, there are 20 fixing dates, which are 3M, 6M, 
9M, 1Y, ..., 5Y. Each quarter, if the corporate A does not default, the protection buyer 
pays a premium, which is approximately equal to $10mn x 2% x 0.25 = $50 000. If there is no 
default during the next five years, the protection buyer will pay a total of $50 000 x 20 = $1 
mn whereas the protection seller will pay nothing. Suppose now that the corporate defaults 
two years and four months after the CDS inception date. In this case, the protection buyer 
will pay $50000 during 9 quarters and will receive the protection leg from the protection 
seller at the default time. This protection leg is equal to (1 — 40%) x $10 mn = $6 mn. 


To compute the mark-to-market value of a CDS, we use the reduced-form approach as 
in the case of bond pricing. If we assume that the premium is not paid after the default 
time 7, the stochastic discounted value of the premium leg is!’: 


SV; (PL) = 5 enue 2 jae shoe. ee 


tm>t 


18We will see that the coupon rate ¢ is in fact the CDS spread s for par swaps. 
1°Tn order to obtain a simple formula, we do not deal with the accrued premium (see Remark 26 on page 
149). 
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Using the standard assumptions that the default time is independent of interest rates and 
the recovery rate, we deduce the present value of the premium leg as follows: 


PV,(PL) = E 5 eA ies 5b Fi 
tm>t 
= 5 c- N -Atm EJL{T > tm}]- |e fi roa) 
tm>t 
= c N: 5” AtmS: (tm) By (tm) 
tm>t 


where S; (u) is the survival function at time u. If we assume that the default leg is exactly 


paid at the default time 7, the stochastic discount value of the default (or protection) leg 
is??: 
SV; (DL) = (1—R) -N-1{r srah r(s)ds 


It follows that its present value is: 


PV; (DL) = [O-R N trs Tp eine 


s 


= (1-R)-N-E[1{r <T}: B(T) 


T 
= a-R)N f B; (u) fı (u) du 


where f(u) is the density function associated to the survival function S; (u). We deduce 
that the mark-to-market of the swap is?!: 


P,(T) = PV; (DL) -— PV: (PL) 
= a-ran f B, (u) fe (u) du — cN X` AtmS: (tm) Bi (tm) 
: tnt 
= x(u-m | B, (u) fi (u) du~ e:RPVa ) (3.5) 


where RPVo1 = aa AtmS: (tm) Bt (tm) is called the risky PV01 and corresponds to the 
present value of 1 bp paid on the premium leg. The CDS price is then inversely related 
to the spread. At the inception date, the present value of the premium leg is equal to the 
present value of the default leg meaning that the CDS spread corresponds to the coupon 
rate such that PP" = 0. We obtain the following expression: 


(1—R) fF Bi (u) fe (u) du 


E is AtmS: (tm) Bt (tm) (3.6) 


The spread S$ is in fact the fair value coupon rate c in such a way that the initial value of 
the credit default swap is equal to zero. 


?0Here the recovery rate R is assumed to be deterministic. 
21 P, is the swap price for the protection buyer. We have then PPY (T) = P; (T) and Pseler (T) = 
—P; (T). 
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We notice that if there is no default risk, this implies that S; (u) = 1 and we get s = 0. 
In the same way, the spread is also equal to zero if the recovery rate is set to 100%. If we 
assume that the premium leg is paid continuously, the formula (3.6) becomes: 


(1—R) fF Bi (u) fe (u) du 


s= 
fr B: (u) S; (u) du 


If the interest rates are equal to zero (B, (u) = 1) and the default times is exponential with 
parameter À — S; (u) = e*(~4 and fi (u) = Aer, we get: 
-R)-à- fE eAl4-t) du 
SE e—Alu—t) du 
= (1-R)-A 


If À is relatively small, we also notice that this relationship can be written as follows: 
sx(1—R)-PD 


where PD is the one-year default probability??. This relationship is known as the ‘credit 
triangle’ because it is a relationship between three variables where knowledge of any two is 
sufficient to calculate the third (O’Kane, 2008). It basically states that the CDS spread is 
approximatively equal to the one-year loss. The spread contains also the same information 
than the survival function and is an increasing function of the default probability. It can 
then be interpreted as a credit risk measure of the reference entity. 


We recall that the first CDS was traded by J.P. Morgan in 1994 (Augustin et al., 2014). 
The CDS market structure has been organized since then, especially the standardization 
of the CDS contract. Today, CDS agreements are governed by 2003 and 2014 ISDA credit 
derivatives definitions. For instance, the settlement of the CDS contract can be either phys- 
ical or in cash. In the case of cash settlement, there is a monetary exchange from the pro- 
tection seller to the protection buyer’. In the case of physical settlement, the protection 
buyer delivers a bond to the protection seller and receives the notional principal amount. 
Because the price of the defaulted bond is equal to R-N, this means that the implied mark- 
to-market of this operation is N — R - N or equivalently (1 — R) - N. Or course, physical 
settlement is only possible if the reference entity is a bond or if the credit event is based 
on the bond default. Whereas physical settlement was prevailing in the 1990s, most of the 
settlements are today in cash. Another standardization concerns the price of CDS. With 
the exception of very specific cases?*, CDS contracts are quoted in (fair) spread expressed 
in bps. In Figures 3.15 and 3.16, we show the evolution of some CDS spreads for a five-year 
maturity. We notice the increase of credit spreads since the 2008 financial turmoil and the 


22We have: 
PD = Pr{r<t+1|7r<t} 
= 1-S;(t4+1) 
= l1-e 
a A 


For instance, if A is equal respectively to 1%, 5%, 10% and 20% , the one-year default probability takes the 
values 1.00%, 4.88%, 9.52% and 18.13%. 

23 This monetary exchange is equal to (1 — R) - N. 

24When the default probability is high (larger than 20%), CDS contracts can be quoted with an upfront 
meaning that the protection seller is asking an initial amount to enter into the swap. For instance, it was 
the case of CDS on Greece in spring 2013. 
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FIGURE 3.15: Evolution of some sovereign CDS spreads 
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FIGURE 3.16: Evolution of some financial and corporate CDS spreads 
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default of Lehman Brothers bankruptcy, the sensitivity of German and Italian spreads with 
respect to the Eurozone crisis and also the difference in level between the different countries. 
Indeed, the spread is globally lower for US than for Germany or Japan. In the case of Italy, 
the spread is high and has reached 600 bps in 2012. We observe that the spread of some 
corporate entities may be lower than the spread of many developed countries (see Figure 
3.16). This is the case of Walmart, whose spread is lower than 20 bps since 2014. When a 
company (or a country) is in great difficulty, the CDS spread explodes as in the case of Ford 
in February 2009. CDS spreads can be used to compare the default risk of two entities in 
the same sector. For instance, Figure 3.16 shows than the default risk of Citigroup is higher 
that this of JPMorgan Chase. 


The CDS spread changes over time, but depends also on the maturity or tenor. This 
implies that we have a term structure of credit spreads for a given date t. This term structure 
is known as the credit spread curve and is noted s (T) where T is the maturity time. Figure 
3.17 shows the credit curve for different entities as of 17 September 2015. We notice that 
the CDS spread increases with the maturity. This is the most common case for investment 
grade (IG) entities, whose short-term default risk is low, but long-term default risk is higher. 
Nevertheless, we observe some distinguishing patterns between these credit curves. For 
instance, the credit risk of Germany is lower than the credit risk of US if the maturity is 
less than five years, but it is higher in the long run. There is a difference of 4 bps between 
Google and Apple on average when the time-to-maturity is less than 5 years. In the case of 
10Y CDS, the spread of Apple is 90.8 bps whereas it is only 45.75 bps for Google. 
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FIGURE 3.17: Example of CDS spread curves as of 17 September 2015 


Remark 24 In other cases, the credit curve may be decreasing (for some high yield cor- 
porates) or have a complex curvature (bell-shaped or U-shaped). In fact, Longstaff et al. 
(2005) showed that the dynamics of credit default swaps also depends on the liquidity risk. 
For instance, the most liquid CDS contract is generally the 5Y CDS. The liquidity on the 
other maturities depends on the reference entity and other characteristics such as the bond 
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market liquidity. For example, the liquidity may be higher for short maturities when the 
credit risk of the reference entity is very high. 


Initially, CDS were used to hedge the credit risk of corporate bonds by banks and 
insurance companies. This hedging mechanism is illustrated in Figure 3.18. We assume that 
the bond holder buys a protection using a CDS, whose fixing dates of the premium leg are 
exactly the same as the coupon dates of the bond. We also assume that the credit even is 
the bond default and the notional of the CDS is equal to the notional of the bond. At each 
fixing date tm, the bond holder receives the coupon C (tm) of the bond and pays to the 
protection seller the premium s-N. This implies that the net cash flow is C (tm) — $s - N. If 
the default occurs, the value of the bond becomes R - N, but the protection seller pays to 
the bond holder the default leg (1 — R)-N. In case of default, the net cash flow is then equal 
toR:-N+(1—R)-N = N, meaning that the exposure on the defaultable bond is perfectly 
hedged. We deduce that the annualized return R of this hedged portfolio is the difference 
between the yield to maturity y of the bond and the annual cost $ of the protection: 


R=y—s (3.7) 


We recognize a new formulation of Equation (3.4) on page 135. In theory, R is then equal 
to the yield to maturity y* of the bond without credit risk. 


S a P 


FIGURE 3.18: Hedging a defaultable bond with a credit default swap 


Since the 2000s, end-users of CDS are banks and securities firms, insurance firms in- 
cluding pension funds, hedge funds and mutual funds. They continue to be used as hedging 
instruments, but they have also become financial instruments to express views about credit 
risk. In this case, ‘long credit refers to the position of the protection seller who is exposed 
to the credit risk, whereas ‘short credit’ is the position of the protection buyer who sold the 
credit risk of the reference entity?®. To understand the mark-to-market of such positions, 
we consider the initial position at the inception date t of the CDS contract. In this case, the 
CDS spread $ (T) verifies that the face value of the swap is equal to zero. Let us introduce 
the notation P; » (T), which defines the mark-to-market of a CDS position whose inception 
date is t, valuation date is t’ and maturity date is T. We have: 


seller buyer 
p (T) = Pp” (T) =0 


25Said differently, a long exposure implies that the default results in a loss, whereas a short exposure 
implies that the default results in a gain. 
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At date t’ > t, the mark-to-market price of the CDS is: 


1 


peuye (7) = N [ (1 R) f B (u) 
tt = = y(u fe (u) du — St (T) : RPVo1 


whereas the value of the CDS spread satisfies the following relationship: 


1 


PS (T)=N|( R) f B (u) = 
ae = — Ma u) fr (u) du — se (T): RPVo | =0 


We deduce that the P&L of the protection buyer is: 


ypbuyer — ered (T) = phuyer (T) n pbuyer (T) 


el 


Using Equation (3.8), we know that P?"* (T) = 0 and we obtain: 


t't! 


qybuyer = ppbuyer (T) _ pbuyer (T) 


t,t! tt 


T 
= N (a = R) | By (u) fe (u) du — & (T) ‘ RPV) = 


1 


T 
N (a = R) | By (u) fe (u) du — Sy (T) RP 


1 


= N. (Sy (T) — & (T)) -RPVo1 (3.8) 


This equation highlights the role of the term RPVo1 when calculating the P&L of the CDS 
position. Because IIs*"et = —]T>eyer, we distinguish two cases: 


e If s (T) > &(T), the protection buyer makes a profit, because this short credit 


exposure has benefited from the increase of the default risk. 


e If Sv (T) < sı (T), the protection seller makes a profit, because the default risk of the 


reference entity has decreased. 


Suppose that we are in the first case. To realize its P&L, the protection buyer has three 
options (O’Kane, 2008): 


1. He could unwind the CDS exposure with the protection seller if the latter agrees. This 


implies that the protection seller pays the mark-to-market P? > (T) to the protection 


buyer. 


. He could hedge the mark-to-market value by selling a CDS on the same reference 


entity and the same maturity. In this situation, he continues to pay the spread $ (T), 
but he now receives a premium, whose spread is equal to sy (T). 


. He could reassign the CDS contract to another counterparty as illustrated in Figure 


3.19. The new counterparty (the protection buyer C in our case) will then pay the 
coupon rate s;(T) to the protection seller. However, the spread is sy (T) at time t’, 
which is higher than $ (T). This is why the new counterparty also pays the mark-to- 


market P?’*" (T) to the initial protection buyer. 


Credit Risk 


Time t Time t 
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Pays the mark-to-market 


FIGURE 3.19: An example of CDS offsetting 
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Remark 25 When the default risk is very high, CDS are quoted with an upfront?®. In this 
case, the annual premium leg is equal to c*- N where e* is a standard value?", and the 


protection buyer has to pay an upfront UF; to the protection seller defined as follows: 


T 
UF, =N (a -R) f R ET an 


Remark 26 Until now, we have simplified the pricing of the premium leg in order to avoid 
complicated calculations. Indeed, if the default occurs between two fixing dates, the protection 
buyer has to pay the premium accrual. For instance, if T € |tm—1,tm|, the accrued premium 


is equal to c- N - (T — tm—1) or equivalently to: 


AP = X c- N: (T-tm1): 1 {im ST < tm} 


tm>t 


We deduce that the stochastic discount value of the accrued premium is: 


SV; (AP) = Soe N- (Ttm) Limi ST Stim} en Se 4 


tm>t 


It follows that: 


PV; (AP) =e-N- Y i (a4, 6) BG FG) du 


tot tm—1 


All the previous formulas remain valid by replacing the expression of the risky PVO1 by the 


following term: 


RPVa = X` (ares: (tm) By (tm) + f "(tb — tm-1) Be (u) fe (u) du 


tm >t tm—1 


26Tt was the case several times for CDS on Greece. 
27 For distressed names, the default coupon rate c* is typically equal to 500 bps. 
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Example 24 We assume that the yield curve is generated by the Nelson-Siegel model with 
the following parameters: 0, = 5%, 02 = —5%, 63 = 6% and 04 = 10. We consider several 
credit default swaps on the same entity with quarterly coupons and a notional of $1 mn. The 
recovery rate R is set to 40% whereas the default time T is an exponential random variable, 
whose parameter A is equal to 50 bps. We consider seven maturities (6M, 1Y, 2Y, 3Y, 5Y, 
7Y and 10Y) and two coupon rates (10 and 100 bps). 


To calculate the prices of these CDS, we use Equation (3.5) with N = 
(or 100) x10-4, Atm = 1/4, A = 50 x 1074 = 0.005, R = 040, S, (u) = 
fe (u) = 0.005 - e7005- and B; (u) = e7 HFC) where the zero-coupon rate is given 
by Equation (3.1). To evaluate the integral, we consider a Gauss-Legendre quadrature of 
128*!? order. By including the accrued premium?*, we obtain results reported in Table 3.7. 
For instance, the price of the 5Y CDS is equal to $9527 if c = 10 x 1074 and —$33 173 if 
c = 100 x 1074. In the first case, the protection buyer has to pay an upfront to the protection 
seller because the coupon rate is too low. In the second case, the protection buyer receives 
the upfront because the coupon rate is too high. We also indicate the spread s and the risky 
PVO1. We notice that the CDS spread is almost constant. This is normal since the default 
rate is constant. This is why the CDS spread is approximatively equal to (1 — 40%) x 50 
bps or 30 bps. The difference between the several maturities is due to the yield curve. The 
risky PVO1 is a useful statistic to compute the mark-to-market. Suppose for instance that 
the two parties entered in a 7Y credit default swap of 10 bps spread two years ago. Now, 
the residual maturity of the swap is five years, meaning that the mark-to-market of the 
protection buyer is equal to: 


10°, c = 10 
e—0-005(u—t) 


Was 10° x (30.08 x 1074 — 10 x 1074) x 4.744 


$9 526 


We retrieve the 5Y CDS price (subject to rounding error). 


TABLE 3.7: Price, spread and risky PV01 of CDS contracts 


P; (T) 

e240 ecim = MYu 
Y2 998 3492 30.01 0.499 
1 1992 —6963 30.02 0.995 
2 3956 —13811 30.04 1.974 
3 5874 —20488 30.05 2.929 
5 9527 —33173 30.08 4.744 
7 12884 —44804 30.10 6.410 
10 17314 —60121 3012 8.604 


Example 25 We consider a variant of Example 24 by assuming that the default time fol- 
lows a Gompertz distribution: 


S; (u) = exp (¢ (1-9) ) 


The parameters @ and y are set to 5% and 10%. 


?8This means that the risky PVO1 corresponds to Equation (3.9). We also report results without taking 
into account the accrued premium in Table 3.8. We notice that its impact is limited. 


TABLE 3.8: Price, spread and risky PV01 of CDS contracts (without the accrued pre- 
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mium) 
P; (T) 
T gaio 640g -7 Yo 
1/2 999 —3 489 30.03 0.499 
1 1993  —6957 30.04 0.994 
2 3957 —13799 30.06 1.973 
3 5876 —20470 30.07 2.927 
5 9530 —33144 30.10 4.742 
7 12888 —44764 30.12 6.406 
10 17319 -—60067 30.14 8.598 


Results are reported in Table 3.9. In this example, the spread is increasing with the 
maturity of the CDS. Until now, we have assumed that we know the survival function 
S; (u) in order to calculate the CDS spread. However, in practice, the CDS spread s is a 
market price and S; (u) has to be determined thanks to a calibration procedure. Suppose 
for instance that we postulate that 7 is an exponential default time with parameter A. We 
can calibrate the estimated value \ such that the theoretical price is equal to the market 
price. For instance, Table 3.9 shows the parameter À for each CDS. We found that d is 
equal to 51.28 bps for the six-month maturity and 82.92 bps for the ten-year maturity. We 
face here an issue, because the parameter Â is not constant, meaning that we cannot use an 
exponential distribution to represent the default time of the reference entity. This is why 
we generally consider a more flexible survival function to calibrate the default probabilities 
from a set of CDS spreads”. 


TABLE 3.9: Calibration of the CDS spread curve using the exponential model 


P, (T) N 
T 42% ceni <. Ma A 
Te 1037 -3454 30.77 0499 51.28 
1 2146 —6808 31.57 0.995 52.59 
2 4585 —13175 33.24 1.973 55.34 
3 7316 —19026 35.00 2927 58.25 
5 13631 —28972 38.80 4.734 64.54 
7 21034 —36391 42.97 6.380 71.44 
10 33999 —42691 49.90 8.521 82.92 


3.1.3.3 Basket default swap 


A basket default swap is similar to a credit default swap except that the underlying 
asset is a basket of reference entities rather than one single reference entity. These products 
are part of multi-name credit default swaps with collateralized debt obligations. 


First-to-default and k'>-to-default credit derivatives Let us consider a credit port- 
folio with n reference entities, which are referenced by the index i. With a first-to-default 
(FtD) credit swap, the credit event corresponds to the first time that a reference entity of the 


29This problem will be solved later in Section 3.3.3.1 on page 203. 
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credit portfolio defaults. We deduce that the stochastic discounted values of the premium 


and default legs are®”: 


tm 


SV, (PL) =ec-N- SE Atm tiny > tm} eh r(s)ds 


tm>t 
and: 
E [o mda 
SV (DL) =£- {tin <T}-e 4 
where 7; is the default time of the it! reference entity, T)., = mMin (T1, ..-, Tn) is the first 


default time in the portfolio and X is the payout of the protection leg: 


i=l 


In this formula, R; and N; are respectively the recovery and the notional of the it! reference 
entity whereas the index i* = {i : Ti = Ti:n} corresponds to the first reference entity that 
defaults. For instance, if the portfolio is composed by 10 names and the third name is the 
first default, the value of the protection leg will be equal to (1 — R3) - N3. Using the same 
assumptions than previously, we deduce that the FtD spread is: 


FtD _ a [X -1 {Tin < T} $ B; (Ti:n)] 
N $i >t Atm Siint (tm) + Be (tm) 


S 


where Si:n, (u) is the survival function of Ti:n. If we assume a homogenous basket (same 
recovery Ri = R and same notional N; = N), the previous formula becomes: 


FD (1 -= R) SE B: (u) fine (u) du 


= 3.10 
etm >t AtmSin,t (tm) B, (tm) ( ) 


where fi:n, (u) is the survival function of Ti:n.- 


To compute the spread*!, we use Monte Carlo simulation (or numerical integration 
when the number of entities is small). In fact, the survival function of 71.,, is related to 
the individual survival functions, but also to the dependence between the default times 
T1,.-.,;Tn. The spread of the FtD is then a function of default correlations**. If we denote 
by Mee the CDS spread of the it! reference, we can show that: 


n 
max (si gh ae Se (3.11) 
{=l 


When the default times are uncorrelated, the FtD is equivalent to buy the basket of all 
the credit defaults swaps. In the case of a perfect correlation, one default is immediately 
followed by the other n — 1 defaults, implying that the FtD is equivalent to the CDS with 
the worst spread. In practice, the FtD spread is therefore located between these two bounds 
as expressed in Equation (3.11). From the viewpoint of the protection buyer, a FtD is seen 
as a hedging method of the credit portfolio with a lower cost than buying the protection 


30In order to simplify the notations, we do not take into account the accrued premium. 

31Laurent and Gregory (2005) provide semi-explicit formulas that are useful for pricing basket default 
swaps. 

32This point is developed in Section 3.3.4 on page 220 and in Chapter 11 dedicated to copula functions. 
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for all the credits. For example, suppose that the protection buyer would like to be hedged 
to the default of the automobile sector. He can buy a FtD on the basket of the largest 
car manufacturers in the world, e.g. Volkswagen, Toyota, Hyundai, General Motors, Fiat 
Chrysler and Renault. If there is only one default, the protection buyer is hedged. However, 
the protection buyer keeps the risk of multiple defaults, which is a worst-case scenario. 


Remark 27 The previous analysis can be extended to k**-to-default swaps. In this case, 
the default leg is paid if the k*> default occurs before the maturity date. We then obtain a 
similar expression as Equation (3.10) by considering the order statistic Tk:n in place of Ti:n- 


From a theoretical point of view, it is equivalent to buy the CDS protection for all the 
components of the credit basket or to buy all the k*®-to-default swaps. We have therefore 


the following relationship: 
= Ce 2 an (3.12) 


We see that the default correlation highly impacts the distribution of the k*-to-default 
spreads? 


Credit default indices Credit derivative indices** have been first developed by J.P. 
Morgan, Morgan Stanley and iBoxx between 2001 and 2003. A credit default index (or 
CDX) is in fact a credit default swap on a basket of reference entities. As previously, we 
consider a portfolio with n credit entities. The protection buyer pays a premium leg with a 
coupon rate c. Every time a reference entity defaults, the notional is reduced by a factor, 
which is equal to 1/n. At the same time, the protection buyer receives the portfolio loss 
between two fixing dates. The expression of the notional outstanding is then given by: 


Waer CRED 


At the inception date, we verify that N; (t) = N. After the first default, the notional 
outstanding is equal to N (1 — t/n). After the kt? default, its value is N (1 — k/n). At time 
u > t, the cumulative loss of the credit portfolio is: 


-JDN (= R;)-1{r7; < u} 


meaning that the incremental loss between two fixing dates is: 
AL; (tm) = Li (tin) = Li (tm-1) 
We deduce that the stochastic discounted value of the premium and default legs is: 
SV; (PL) =e: Ý Atm Ne (tm) ed,” eds 
tm>t 


and: f 
SV; (DL) = S Aup d a 


tm>t 


33See page 762 for an illustration. 
34 They are also known as synthetic credit indices, credit default swap indices or credit default indices. 
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We deduce that the spread of the CDX is: 


eo> pa Ali (tm) - Bi (Em )| (3.13) 


i peer Atm: Ni (tm): Be (tm) 


Remark 28 A CDX is then equivalent to a portfolio of CDS whose each principal notional 
is equal to N/n. Indeed, when a default occurs, the protection buyer receives N/n- (1 — Ri) 
and stops to pay the premium leg of the defaulted reference entity. At the inception date, 
the annual premium of the CDX is then equal to the annual premium of the CDS portfolio: 


nm N 
CDX = CDS 
sOPX N= 2 ae 
a 


We deduce that the spread of the CDX is an average of the credit spreads that compose the 
portfolio”: 


1 n 
CDX CDS 
s = — > 5; 3.14 


Today, credit default indices are all managed by Markit and have been standardized. For 
instance, coupon payments are made on a quarterly basis (March 20, June 20, September 
20, December 20) whereas indices roll every six months with an updated portfolio®®. With 
respect to the original credit indices, Markit continues to produces two families: 


e Markit CDX 
It focuses on North America and Emerging Markets credit default indices. The three 
major sub-indices are IG (investment grade), HY (high yield) and EM (emerging 
markets). A more comprehensive list is provided in Table 3.10. Besides these credit 
default indices, Markit CDX produces also four other important indices: ABX (basket 
of ABS), CMBX (basket of CMBS), LCDX (portfolio of 100 US secured senior loans) 
and MCDX (basket of 50 municipal bonds). 


Markit iTraxx 

It focuses on Europe, Japan, Asia ex-Japan and Australia (see the list in Table 3.11). 
Markit iTraxx also produces LevX (portfolio of 40 European secured loans), sec- 
tor indices (e.g. European financials and industrials) and SovX, which corresponds 
to a portfolio of sovereign issuers. There are 7 SovX indices: Asia Pacific, BRIC, 
CEEMEA®”, G7, Latin America, Western Europe and Global Liquid IG. 


In Table 3.12, we report the spread of some CDX/iTraxx indices. We note that the spread 
of the CDX.NA.HY index is on average four times larger than the spread of the CDX.NA.IG 
index. While spreads of credit default indices have generally decreased between December 
2012 and December 2014, we observe a reversal in 2015. For instance, the spread of the 
CDX.NA.IG index is equal to 93.6 bps in September 2015 whereas it was only equal to 
66.3 bps nine months ago. We observe a similar increase of 30 bps for the iTraxx Europe 
index. For the CDX.NA.HY index, it is more impressive with a variation of +150 bps in 
nine months. 


35In fact, this is an approximation because the payment of the default leg does not exactly match between 
the CDX index and the CDS portfolio. 

36See Markit (2014) for a detailed explanation of the indices’ construction. 

37Central and Eastern Europe, Middle East and Africa. 
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TABLE 3.10: List of Markit CDX main indices 


Index name Description n R 
CDX.NA.IG Investment grade entities 125 40% 
CDX.NA.IG.HVOL High volatility IG entities 30 40% 
CDX.NA.XO Crossover entities 35 40% 
CDX.NA.HY High yield entities 100 30% 
CDX.NA.HY.BB High yield BB entities 37 30% 
CDX.NA.HY.B High yield B entities 46 30% 
CDX.EM EM sovereign issuers 14 25% 
LCDX Secured senior loans 100 70% 
MCDX Municipal bonds 50 80% 


Source: Markit (2014). 


TABLE 3.11: List of Markit iTraxx main indices 


Index name Description n R 

iTraxx Europe European IG entities 125 40% 
iTraxx Europe HiVol European HVOL IG entities 30 40% 
iTraxx Europe Crossover European XO entities 40 40% 
iTraxx Asia Asian (ex-Japan) IG entities 50 40% 
iTraxx Asia HY Asian (ex-Japan) HY entities 20 25% 
iTraxx Australia Australian IG entities 25 40% 
iTraxx Japan Japanese IG entities 50 35% 
iTraxx SovX G7 G7 governments 7 40% 
iTraxx LevX European leveraged loans 40 40% 


Source: Markit (2014). 


TABLE 3.12: Historical spread of CDX/iTraxx indices (in bps) 


CDX iTraxx 
NAIG NA.HY EM Europe Japan Asia 
Dec. 2012 94.1 484.4 208.6 117.0 159.1 108.8 
Dec. 2013 62.3 305.6 272.4 70.1 67.5 129.0 
Dec. 2014 66.3 357.2 341.0 62.8 67.0 106.0 
Sep. 2015 93.6 505.3 381.2 90.6 82.2 160.5 


Date 


3.1.3.4 Collateralized debt obligations 


A collateralized debt obligation (CDO) is another form of multi-name credit default 
swaps. It corresponds to a pay-through ABS structure?8, whose securities are bonds linked 
to a series of tranches. If we consider the example given in Figure 3.20, they are 4 types of 
bonds, whose returns depend on the loss of the corresponding tranche (equity, mezzanine, 
senior and super senior). Each tranche is characterized by an attachment point A and a 


38See Figure 3.12 on page 139. 
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detachment point D. In our example, we have: 


Tranche Equity Mezzanine Senior Super senior 
A 0% 15% 25% 35% 
D 15% 25% 35% 100% 


The protection buyer of the tranche [A, D] pays a coupon rate cl[4&P] on the nominal out- 
standing amount of the tranche to the protection seller. In return, he receives the protection 
leg, which is the loss of the tranche [A, D]. However, the losses satisfy a payment priority 
which is the following: 


Assets Liabilities 


a") 
=f 
© 
Se 
35 — 100% > 
° 
ZA 
Q 
Credit & 
portfolio 5 
zi 
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25 — 35% a 
ad 
0—15% 


FIGURE 3.20: Structure of a collateralized debt obligation 


e the equity tranche is the most risky security, meaning that the first losses hit this 
tranche alone until the cumulative loss reaches the detachment point; 


e from the time the portfolio loss is larger than the detachment point of the equity 
tranche, the equity tranche no longer exists and this is the protection seller of the 
mezzanine tranche, who will pay the next losses to the protection buyer of the mez- 
zanine tranche; 


e the protection buyer of a tranche pays the coupon from the inception of the CDO until 
the death of the tranche, i.e., when the cumulative loss is larger than the detachment 
point of the tranche; moreover, the premium payments are made on the reduced 
notional after each credit event of the tranche. 


Each CDO tranche can then be viewed as a CDS with a time-varying notional principal to 
define the premium leg and a protection leg, which is paid if the portfolio loss is between 
the attachment and detachment points of the tranche. We can therefore interpret a CDO 
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as a basket default swap, where the equity, mezzanine, senior and super senior tranches 


correspond respectively to a first-to-default, second-to-default, third-to-default and last-to- 
default swaps. 


Let us now see the mathematical framework to price a CDO tranche. Assuming a port- 
folio of n credits, the cumulative loss is equal to: 


Li(u) = Ni (1- Ri) A {ri <u} 


whereas the loss of the tranche [A, D] is given by°?: 


LAP (u) = (Le (u) — A)-1{A< L (u) < DE + 
(D — A) {Llu > D} 


where A and D are the attachment and detachment points expressed in $. The nominal 
outstanding amount of the tranche is therefore: 


NISP! (u) = (D — A) — LA”! (u) 


This notional principal decreases then by the loss of the tranche. At the inception of the 
CDO, NIA? (t) is equal to the tranche thickness: (D — A). At the maturity date T, we 
have: 


NIPI (T) = (D-A)- L^” (T) 
(GA) if L (T) < A 
= (D- L(T)) if A< L(T)<D 
0 if Li (T) > D 


We deduce that the stochastic discounted value of the premium and default legs is: 


SV; (PL) = FP). Y Atm NEP! (tm) ehe 88 


tm>t 


and: ; 
SV; (DL) = F A gjah 


tm>t 


Therefore, the spread of the CDO tranche is*?: 


HS. s ALP] (tm) Bi (tm 
sian E Danze AL (tm) Be (tm) a 


a [ua A,D 
tm >t Atm j N] l (tm) -B (tm)] 


We obviously have the following inequalities: 


gEquity > Mezzanine > Senior > g5uper senior 


39 Another expression is: 
LI4-Pl (u) = min (D — A, (Lt (u) — A)*) 


40This formula is obtained by assuming no upfront and accrued interests. 
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As in the case of k'-to-default swaps, the distribution of these tranche spreads highly 
depends on the default correlation*!. Depending on the model and the parameters, we can 
therefore promote the protection buyer/seller of one specific tranche with respect to the 
other tranches. 


When collateralized debt obligations emerged in the 1990s, they were used to transfer 
credit risk from the balance sheet of banks to investors (e.g. insurance companies). They 
were principally portfolios of loans (CLO) or asset-backed securities (ABS CDO). With 
these balanced-sheet CDOs, banks could recover regulatory capital in order to issue new 
credits. In the 2000s, a new type of CDOs was created by considering CDS portfolios as 
underlying assets. These synthetic CDOs are also called arbitrage CDOs, because they have 
used by investors to express their market views on credit. 


The impressive success of CDOs with investors before the 2008 Global Financial Cri- 
sis is due to the rating mechanism of tranches. Suppose that the underlying portfolio is 
composed of BB rated credits. It is obvious that the senior and super senior tranches will 
be rated higher than BB, because the probability that these tranches will be impacted is 
very low. The slicing approach of CDOs enables then to create high-rated securities from 
medium or low-rated debts. Since the appetite of investors for AAA and AA rated bonds 
was very important, CDOs were solutions to meet this demand. Moreover, this lead to the 
development of rating methods in order to provide an attractive spread. This explains that 
most of AAA-rated CDO tranches promised a return higher than AAA-rated sovereign and 
corporate bonds. In fact, the 2008 GFC has demonstrated that many CDO tranches were 
more risky than expected, because the riskiness of the assets were underestimated*?. 


TABLE 3.13: List of Markit credit default tranches 


Index name Tranche 


CDX.NA.IG 0-3 3-7 7-15 15—100 

CDX.NA.HY 0—10 10—15 15-25 25-35 35—100 

LCDX 0-5 5-8 8-12 12-15 15-100 
-iTraxx Europe 0-3 3-6 6-9 9-12 12—22 22—100 
“iTraxx Europe XO 0-10 10—15 15-25 25-35 35—100 © 

iTraxx Asia 0-3 3-6 6-9 9-12 12-22 

iTraxx Australia 0-3 3-6 6-9 9-12 12-22 

iTraxx Japan 0-3 3-6 6-0 9-12 12-22 


Source: Markit (2014). 


For some years now, CDOs have been created using credit default indices as the under- 
lying portfolio. For instance, Table 3.13 provides the list of available tranches on Markit 
indices. We notice that attachment and detachment points differ from one index to another 
index. The first tranche always indicates the equity tranche. For IG underlying assets, the 
notional corresponds to the first 3% losses of the portfolio, whereas the detachment point 
is higher for crossover or high yield assets. We also notice that some senior tranches are not 
traded (Asia, Australia and Japan). These products are mainly used in correlation trading 
activities and also served as benchmarks for all the other OTC credit debt obligations. 


41See Section 3.3.4 on page 220. 

42 More details of the impact of the securitization market on the 2008 Global Financial Crisis are developed 
in Chapter 8 dedicated to systemic risk. 

43They are also called credit default tranches (CDT). 
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3.2 Capital requirement 


This section deals with regulatory aspects of credit risk. From a historical point of view, 
this is the first risk which has requested regulatory capital before market risk. Nevertheless, 
the development of credit risk management is more recent and was accelerated with the Basel 
II Accord. Before presenting the different approaches for calculating capital requirements, 
we need to define more precisely what credit risk is. 


It is the risk of loss on a debt instrument resulting from the failure of the borrower to 
make required payments. We generally distinguish two types of credit risk. The first one is 
the ‘default risk’, which arises when the borrower is unable to pay the principal or interests. 
An example is a student loan or a mortgage loan. The second type is the ‘downgrading risk’, 
which concerns debt securities. In this case, the debt holder may face a loss, because the 
price of the debt security is directly related to the credit risk of the borrower. For instance, 
the price of the bond may go down because the credit risk of the issuer increases and even 
if the borrower does not default. Of course, default risk and downgrading risk are highly 
correlated, because it is rare that a counterparty suddenly defaults without downgrading of 
its credit rating. 


To measure credit risk, we first need to define the default of the obligor. BCBS (2006) 
provides the following standard definition: 


“A default is considered to have occurred with regard to a particular obligor 
when either or both of the two following events have taken place. 


e The bank considers that the obligor is unlikely to pay its credit obligations 
to the banking group in full, without recourse by the bank to actions such 
as realizing security (if held). 


e The obligor is past due more than 90 days on any material credit obligation 
to the banking group. Overdrafts will be considered as being past due once 
the customer has breached an advised limit or been advised of a limit 
smaller than current outstandings” (BCBS, 2006, page 100). 


This definition contains both objective elements (when a payment has been missed or de- 
layed) and subjective elements (when a loss becomes highly probable). This last case gener- 
ally corresponds to an extreme situation (specific provision, distressed restructuring, etc.). 
The Basel definition of default covers then two types of credit: debts under litigation and 
doubtful debts. 


Downgrading risk is more difficult to define. If the counterparty is rated by an agency, it 
can be measured by a single or multi-notch downgrade. However, it is not always the case 
in practice, because the credit quality decreases before the downgrade announcement. A 
second measure is to consider a market-based approach by using CDS spreads. However, we 
notice that the two methods concern counterparties, which are able to issue debt securities, 
in particular bonds. For instance, the concept of downgrading risk is difficult to apply for 
retail assets. 

The distinction between default risk and downgrading risk has an impact on the credit 
risk measure. For loans and debt-like instruments that cannot be traded in a market, the 
time horizon for managing credit risk is the maturity of the credit. Contrary to this held-to- 
maturity approach, the time horizon for managing debt securities is shorter, typically one 
year. In this case, the big issue is not to manage the default, but the mark-to-market of the 
credit exposure. 
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3.2.1 The Basel I framework 


According to Tarullo (2008), two explanatory factors were behind the Basel I Accord. 
The first motivation was to increase capital levels of international banks, which were very 
low at that time and had continuously decreased for many years. For instance, the ratio of 
equity capital to total assets*4 was 5.15% in 1970 and only 3.83% in 1981 for the 17 largest 
US banks. In 1988, this capital ratio was equal to 2.55% on average for the five largest bank 
in the world. The second motivation concerned the distortion risk of competition resulting 
from heterogeneous national capital requirements. One point that was made repeatedly, 
especially by US bankers, was the growth of Japanese banks. In Table 3.14, we report the 
ranking of the 10 world’s largest banks in terms of assets ($ bn) between 2001 and 2008. 
While there is only one Japanese bank in the top 10 in 1981, nine Japanese banks are 
included in the ranking seven years later. In this context, the underlying idea of the Basel 
I Accord was then to increase capital requirements and harmonize national regulations for 
international banks. 


TABLE 3.14: World’s largest banks in 1981 and 1988 


1981 1988 

Bank Assets Bank Assets 

1 Bank of America (US) 115.6 Dai-Ichi Kangyo (JP) 352.5 
2 Citicorp (US) 112.7 Sumitomo (JP) 334.7 
3 BNP (FR) 106.7 Fuji (JP) 327.8 
4 Crédit Agricole (FR) 97.8 Mitsubishi (JP) 317.8 
5 Crédit Lyonnais (FR) 93.7 Sanwa (JP) 307.4 
6 Barclays (UK) 93.0 Industrial Bank (JP) 261.5 
7 Société Générale (FR) 87.0 Norinchukin (JP) 231.7 
8 Dai-Ichi Kangyo (JP) 85.5 Crédit Agricole (FR) 214.4 
9 Deutsche Bank (DE) 84.5 Tokai (JP) 213.5 
10 National Westminster (UK) 82.6 Mitsubishi Trust (JP) 206.0 


Source: Tarullo (2008). 


The Basel I Accord provides a detailed definition of bank capital C and risk-weighted 
assets RWA. We reiterate that tier one (T1) capital consists mainly of common stock and 
disclosed reserves, whereas tier two (T2) capital includes undisclosed reserves, general pro- 
visions, hybrid debt capital instruments and subordinated term debt. Risk-weighted assets 
are simply calculated as the product of the asset notional (the exposure at default or EAD) 
by a risk weight (RW). Table 3.15 shows the different values of RW with respect to the 
category of the asset. For off-balance sheet assets, BCBS (1988) defines credit conversion 
factor (CCF) for converting the amount F of a credit line or off-balance sheet asset to an 
exposure at default: 

EAD = E . CCF 


The CCF values are 100% for direct credit substitutes (standby letters of credit), sale and 
repurchase agreements, forward asset purchases, 50% for standby facilities and credit lines 
with an original maturity of over one year, note issuance facilities and revolving underwriting 
facilities, 20% for short-term self-liquidating trade-related contingencies and 0% for standby 
facilities and credit lines with an original maturity of up to one year. The above framework 
is used to calculate the Cooke ratio, which is in fact a set of two capital ratios. The core 


44 All the statistics of this section comes from Chapters 2 and 3 of Tarullo (2008). 
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TABLE 3.15: Risk weights by category of on-balance sheet assets 


RW Instruments 

Cash 
Claims on central governments and central banks denominated in 
0% national currency and funded in that currency 
Other claims on OECD central governments and central banks 
Claims? collateralized by cash of OECD government securities 
Claims? on multilateral development banks o _ 
Claims! on banks incorporated in the OECD and claims guaranteed 
by OECD incorporated banks 
Claims’ on securities firms incorporated in the OECD subject to 
20% comparable supervisory and regulatory arrangements 
Claims! on banks incorporated in countries outside the OECD with 
a residual maturity of up to one year 
Claims on non-domestic OECD public-sector entities 

Cash items in process of collection 

~ 50% Loans fully secured by mortgage on residential property 
_ Claims on the private sector a ia _ 
Claims on banks incorporated outside the OECD with a residual 
maturity of over one year 
Claims on central governments outside the OECD and non denom- 
inated in national currency 
All other assets 


100% 


tor guaranteed by these entities. 


Source: BCBS (1988). 


capital ratio includes only tier one capital whereas the total capital ratio considers both tier 
one Cı and tier two Cə capital: 


Ci 
Tier 1 rati = — >4 
ier 1 ratio RWA = % 
C C. 
Tier 2 ratio = Ee > 8% 


Example 26 The assets of the bank are composed of $100 mn of US treasury bonds, $20 
mn of Mexico government bonds denominated in US dollar, $20 mn of Argentine debt de- 
nominated in Argentine peso, $500 mn of residential mortgage, $500 mn of corporate loans, 
$20 mn of non-used standby facilities for OECD governments and $100 mn of retail credit 
lines, which are decomposed as follows: $40 mn are used and 70% of non-used credit lines 
have a maturity greater than one year. 


For each asset, we calculate RWA by choosing the right risk weight and credit conversion 
factor for off-balance sheet items. We obtain the results below. The risk-weighted assets of 
the bank are then equal to $831 mn. We deduce that the required capital K is $33.24 mn 
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for tier one. 


Pen eee E CCF EAD RW RWA 
Sheet 

US bonds 100 0% 0 

Mexico bonds 20 100% 20 

On- Argentine debt 20 0% 0 

Home mortgage 500 50% 250 

Corporate loans 500 100% 500 

Credit lines 40 100% 40 


Standby facilities 20 100% 20 0% 0- 
Off Credit lines (> 1Y) 42 50% 21 100% 21 
Credit lines(<1Y) 18 0% 0 100% 0 
Total 831 


3.2.2 The Basel II standardized approach 


The main criticism of the Cooke ratio is the lack of economic rationale with respect to 
risk weights. Indeed, most of the claims have a 100% risk weight and do not reflect the 
real credit risk of the borrower. Other reasons have been given to justify a reformulation of 
capital requirements for credit risk with the goal to: 


e obtain a better credit risk measure by taking into account the default probability of 
the counterparty; 


e avoid regulatory arbitrage, in particular by using credit derivatives; 


e have a more coherent framework that supports credit risk mitigation. 


3.2.2.1 Standardized risk weights 


In Basel II, the probability of default is the key parameter to define risk weights. For 
the standardized approach (SA), they depend directly on external ratings whereas they are 
based on internal rating for the IRB approach. Table 3.16 shows the new matrix of risk 
weights, when we consider the Standard & Poor’s rating system. We notice that there are 
four main categories of claims*®: sovereigns, banks, corporates and retail portfolios. 

The sovereign exposure category include central governments and central banks, whereas 
non-central public sector entities are treated with the bank exposure category. We note that 
there are two options for the latter, whose choice is left to the discretion of the national 
supervisors*’. Under the first option, the risk weight depends on the rating of the country 
where the bank is located. Under the second option, it is the rating of the bank that 
determines the risk weight, which is more favorable for short-term claims (three months or 
less). The risk weight of a corporate is calculated with respect to the rating of the entity, but 
uses a slightly different breakdown of ratings than the second option of the bank category. 
Finally, the Basel Committee uses lower levels for retail portfolios than those provided in 
the Basel I Accord. Indeed, residential mortgages and retail loans are now risk-weighted at 
35% and 75% instead of 50% and 100% previously. Other comparisons between Basel I and 
Basel II (with the second option for banks) are shown in Table 3.17. 


45NR stands for non-rated entities. 

46The regulatory framework is more comprehensive by considering three other categories (public sector 
entities, multilateral development banks and securities firms), which are treated as banks. For all other 
assets, the standard risk weight is 100%. 

47The second option is more frequent and was implemented in Europe, US and Japan for instance. 
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TABLE 3.16: Risk weights of the SA approach (Basel II) 
AAA A+ BBB+ BB+ = CCC+ 


Rating to to to to to NR 
AA— A- BBB- B- C 
Sovereigns 0% 20% 50% 100% 150% 100% 
© I 20% 50% 100% 100% 150% 100%- 
Banks 2 20% 50% 50% 100% 150% 50% 


BBB+ to BB— B+ toC 


ADONE 20% 50% 100% 150% 100% 
Retail 00 75% 

Residential mortgages 35% 

Commercial mortgages 100% 


TABLE 3.17: Comparison of risk weights between Basel I and Basel II 


Entity Rating Maturity Basel I Basel II 
Sovereign (OECD) AAA 0% 0% 
Sovereign (OECD) A- 0% 20% 
Sovereign BBB 100% 50% 
Bank (OECD) BBB 2Y 20% 50% 
Bank BBB 2M 100% 20% 
Corporate AA+ 100% 20% 
Corporate BBB 100% 100% 


The SA approach is based on external ratings and then depends on credit rating agencies. 
The most famous are Standard & Poor’s, Moody’s and Fitch. However, they cover only large 
companies. This is why banks will also consider rating agencies specialized in a specific sector 
or a given country*®. Of course, rating agencies must be first registered and certified by the 
national supervisor in order to be used by the banks. The validation process consists of 
two steps, which are the assessment of the six required criteria (objectivity, independence, 
transparency, disclosure, resources and credibility) and the mapping process between the 
ratings and the Basel matrix of risk weights. 


Table 3.18 shows the rating systems of S&P, Moody’s and Fitch, which are very similar. 
Examples of S&P’s rating are given in Tables 3.19, 3.20 and 3.21. We note that the rating 
of many sovereign counterparties has been downgraded by at least one notch, except China 
which has now a better rating than before the 2008 GFC. For some countries, the rating 
in local currency is different from the rating in foreign currency, for instance Argentina, 
Brazil, Russia and Ukraine*®. We observe the same evolution for banks and it is now rare 
to find a bank with a AAA rating. This is not the case of corporate counterparties, which 
present more stable ratings across time. 


Remark 29 Credit conversion factors for off-balance sheet items are similar to those de- 
fined in the original Basel Accord. For instance, any commitment that is unconditionally 
cancelable receives a 0% CCF. A CCF of 20% (resp. 50%) is applied to commitments with 


48For instance, banks may use Japan Credit Rating Agency Ltd for Japanese public and corporate en- 
tities, DBRS Ratings Limited for bond issuers, Cerved Rating Agency for Italian small and medium-sized 
enterprises, etc. 

49 An SD rating is assigned in case of selective default of the obligor. 
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TABLE 3.18: Credit rating system of S&P, Moody’s and Fitch 


Prime High Grade Upper 
Maximum Safety High Quality Medium Grade 
S&P/Fitch AAA AA+ AA AA- |A+ A A= 
Moody’s Aaa ee Aa2 Aa3 | Al A2 A3 


Lower Non Investment Grade 

Medium Grade | Speculative 

BB+ BB BB— 
Bal Ba2 Ba3 


S&P/Fitch | BBB+ BBB BBB- 
Moody’s Baal Baa2  Baa3 


Highly Substantial In Poor Extremely 

Speculative Risk Standing Speculative 
S&P/Fitch | B+ B B- CCC+ ccc CCC- cc 
Moody’s B1 B2 B3 Caal Caa2 Caa3 Ca 


TABLE 3.19: Examples of country’s S&P rating 


County Local currency Foreign currency 
Jun. 2009 Oct. 2015 Jun. 2009 Oct. 2015 
Argentina B- CCC+ B- SD 
Brazil BBB+ BBB- BBB- BB+ 
China A+ AA- A+ AA- 
France AAA AA AAA AA 
Italy A+ BBB- A+ BBB- 
Japan AA A+ AA A+ 
Russia BBB+ BBB- BBB BB+ 
Spain AA+ BBB+ AA+ BBB+ 
Ukraine B- CCC+ CCC+ SD 
US AAA AA+ AA+ AA+ 


Source: Standard & Poor’s, www.standardandpoors.com. 


TABLE 3.20: Examples of bank’s S&P rating 


Bank Oct. 2001 Jun. 2009 Oct. 2015 
Barclays Bank PLC AA AA- A- 

Credit Agricole S.A. AA AA- A 
Deutsche Bank AG AA A+ BBB+ 
International Industrial Bank CCC+ BB- 

JPMorgan Chase & Co. AA- A+ A 

UBS AG AA+ A+ A 


Source: Standard & Poor’s, www.standardandpoors.com. 
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TABLE 3.21: Examples of corporate’s S&P rating 


Corporate Jul. 2009 Oct. 2015 
Danone A- A- 

Exxon Mobil Corp. AAA AAA 
Ford Motor Co. CCC+ BBB- 
General Motors Corp. D BBB- 
L’Oreal S.A. NR NR 
Microsoft Corp. AAA AAA 
Nestle S.A. AA AA 

The Coca-Cola Co. A+ AA 
Unilever PLC A+ A+ 


Source: Standard & Poor’s, www.standardandpoors.com. 


an original maturity up to one year (resp. greater than one year). For revolving underwriting 
facilities, the CCF is equal to 50% whereas it is equal to 100% for other off-balance sheet 
items (e.g. direct credit substitutes, guarantees, sale and repurchase agreements, forward 
asset purchases). 


3.2.2.2 Credit risk mitigation 


Credit risk mitigation (CRM) refers to the various techniques used by banks for reducing 
the credit risk. These methods allow to decrease the credit exposure or to increase the 
recovery in case of default. The most common approaches are collateralized transactions, 
guarantees, credit derivatives and netting agreements. 


Collateralized transactions In such operations, the credit exposure of the bank is par- 
tially hedged by collateral posted by the counterparty. BCBS (2006) defines then the fol- 
lowing eligible instruments: 


1. Cash and comparable instruments; 
2. Gold; 


3. Debt securities which are rated AAA to BB- when issued by sovereigns or AAA to BBB- 
when issued by other entities or at least A-3/P-3 for short-term debt instruments; 


4. Debt securities which are not rated but fulfill certain criteria (senior debt issued by 
banks, listed on a recognisee exchange and sufficiently liquid); 


5. Equities that are included in a main index; 


6. UCITS and mutual funds, whose assets are eligible instruments and which offer a 
daily liquidity; 
7. Equities which are listed on a recognized exchange and UCITS/mutual funds which 


include such equities. 


The bank has the choice between two approaches to take into account collateralized 
transactions. In the simple approach®’, the risk weight of the collateral (with a floor of 


50Collateral instruments (7) are not eligible for this approach. 
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20%) is applied to the market value of the collateral C whereas the non-hedged exposure 
(EAD —C) receives the risk weight of the counterparty: 


RWA = (EAD —C) - RW +C - max (RWc, 20%) (3.16) 


where EAD is the exposure at default, C is the market value of the collateral, RW is the 
risk weight appropriate to the exposure and RW¢ is the risk weight of the collateral. The 
second method, called the comprehensive approach, is based on haircuts. The risk-weighted 
asset amount after risk mitigation is RWA = RW-EAD* whereas EAD* is the modified 
exposure at default defined as follows: 


where Hg is the haircut applied to the exposure, Hc is the haircut applied to the collateral 
and Hx is the haircut for currency risk. Table 3.22 gives the standard supervisory values 
of haircuts. If the bank uses an internal model to calculate haircuts, they must be based on 
the value-at-risk with a 99% confidence level and an holding period which depends on the 
collateral type and the frequency of remargining. The standard supervisory haircuts have 
been calibrated by assuming daily mark-to-market, daily remargining and a 10-business day 
holding period. 


TABLE 3.22: Standardized supervisory haircuts for collateralized transactions 


Rating oe Sovereigns Others 
0—1Y 0.5% 1% 
AAA to AA— 1—5Y 2% 4% 
5Y+ 4% 8% 
p g 7 7 ~ 0-1Y 7 1% GH 
A+ to BBB— 1—5Y 3% 6% 
5Y+ 6% 12% 
“BB+ toBB—- dH 
Cash 0% 
Gold 15% 
Main index equities 15% 
Equities listed on a recognized exchange 25% 
FX risk 8% 


Example 27 We consider a 10-year credit of $100 mn to a corporate firm rated A. The 
credit is guaranteed by five collateral instruments: a cash deposit ($2 mn), a gold deposit ($5 
mn), a sovereign bond rated AA with a 2-year residual maturity ($15 mn) and repurchase 
transactions on Microsoft stocks ($20 mn) and Wirecard’! stocks ($20 mn). 


Before credit risk mitigation, the risk-weighted asset amount is equal to: 
RWA = 100 x 50% = $50 mn 


If we consider the simple approach, the repurchase transaction on Wirecard stocks is not 
eligible, because it does not fall within categories (1)-(6). The risk-weighted asset amount 


51 Wirecard is a German financial company specialized in payment processing and issuing services. The 
stock belongs to the MSCI Small Cap Europe index. 
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becomes”?: 
RWA = (100—2—5—15—20) x 50% + (2+5 + 15+ 20) x 20% 


$37.40 mn 


The repurchase transaction on Wirecard stocks is eligible in the comprehensive approach, 
because these equity stocks are traded in Börse Frankfurt. The haircuts are 15% for gold, 2% 
for the sovereign bond and 15% for Microsoft stocks. For Wirecard stocks, a first haircut 
of 25% is applied because this instrument belongs to the seventh category and a second 
haircut of 8% is applied because there is a foreign exchange risk. The adjusted exposure at 
default is then equal to: 


EAD* (1+ 8%) x 100 — 2 — (1 — 15%) x 5 — (1 — 2%) x 15 — 
(1 — 15%) x 20 — (1 — 25% — 8%) x 20 
$ 


73.65 mn 


It follows that: 
RWA = 73.65 x 50% = $36.82 mn 


Guarantees and credit derivatives Banks can use these credit protection instruments 
if they are direct, explicit, irrevocable and unconditional. In this case, banks use the simple 
approach given by Equation (3.16). The case of credit default tranches is covered by rules 
described in the securitization framework. 


Maturity mismatches A maturity mismatch occurs when the residual maturity of the 
hedge is less than that of the underlying asset. In this case, the bank uses the following 
adjustment: 

min (Te, T,5) — 0.25 
` min (T,5) — 0.25 
where T is the residual maturity of the exposure and Tg is the residual maturity of the 
collateral (or guarantee). 


Ca=C 


(3.18) 


Example 28 The bank A has granted a credit of $30 mn to a corporate firm B, which 
is rated BB. In order to hedge the default risk, the bank A buys $20 mn of a 3-year CDS 
protection on B to the bank C, which is rated A+. 


If the residual maturity of the credit is lower than 3 years, we obtain: 
RWA = (30 — 20) x 100% + 20 x 50% = $20 mn 


If the residual maturity is greater than 3 years, we first have to calculate the adjusted value 
of the guarantee. Assuming that the residual maturity is 4 years, we have: 


min (3, 4,5) — 0.25 


= $14.67 
min (4,5) — 0.25 n mn 


Ga = 20 x 


It follows that: 


RWA = (30 — 14.67) x 100% + 14.67 x 50% = $22.67 mn 


52 The floor of 20% is applied to the cash, gold and sovereign bond collateral instruments. The risk weight 
for Microsoft stocks is 20% because the rating of Microsoft is AAA. 
53Because Microsoft belongs to the S&P 500 index, which is a main equity index. 
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3.2.3 The Basel II internal ratings-based approach 


The completion of the internal ratings-based (IRB) approach was a complex task, be- 
cause it required many negotiations between regulators, banks and politics. Tarullo (2008) 
points out that the publication of the first consultative paper (CP1) in June 1999 was 
both “anticlimactic and contentious”. The paper is curiously vague without a precise di- 
rection. The only tangible proposal is the use of external ratings. The second consultative 
paper is released in January 2001 and includes in particular the IRB approach, which has 
been essentially developed by US members of the Basel Committee with the support of 
large international banks. The press release dated 16 January 2001 indicated that the Basel 
Committee would finalize the New Accord by the end of 2001, for an implementation in 
2004. However, it has taken much longer than originally anticipated and the final version 
of the New Accord was published in June 2004 and implemented from December 200654. 
The main reason is the difficulty of calibrating the IRB approach in order to satisfy a large 
part of international banks. The IRB formulas of June 2004 are significantly different from 
the original ones and reflect compromises between the different participants without really 
being satisfactory. 


3.2.3.1 The general framework 


Contrary to the standardized approach, the IRB approach is based on internal rating 
systems. With such a method, the objectives of the Basel Committee were to propose a 
more sensitive credit risk measure and define a common basis between internal credit risk 
models. The IRB approach may be seen as an external credit risk model, whose parameters 
are provided by the bank. Therefore, it is not an internal model, but a first step to harmonize 
the internal risk management practices by focusing on the main risk components, which are: 


e the exposure at default (EAD); 
e the probability of default (PD); 
e the loss given default (LGD); 


e the effective maturity (M). 


The exposure at default is defined as the outstanding debt at the time of default. For 
instance, it is equal to the principal amount for a loan. The loss given default is the expected 
percentage of exposure at default that is lost if the debtor defaults. At first approximation, 
one can consider that LGD ~ 1— R, where R is the recovery rate. While EAD is expressed 
in $, LGD is measured in %. For example, if EAD is equal to $10 mn and LGD is set to 70%, 
the expected loss due to the default is equal to $7 mn. The probability of default measures 
the default risk of the debtor. In Basel II, the time horizon of PD is set to one year. When 
the duration of the credit is not equal to one year, one has to specify its effective maturity 
M. This is the combination of the one-year default probability PD and the effective maturity 
M that measures the default risk of the debtor until the duration of the credit. 


In this approach, the credit risk measure is the sum of individual risk contributions: 


54See Chapter 4 entitled “Negotiating Basel II” of Tarullo (2008) for a comprehensive story of the Basel 
II Accord. 
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where RC; is a function of the four risk components: 
RC; = firs (EAD;, LGD;, PD;, M;) 


and firs is the IRB fomula. In fact, there are two IRB methodologies. In the foundation 
IRB approach (FIRB), banks use their internal estimates of PD whereas the values of the 
other components (EAD, LGD and M) are set by regulators. Banks that adopt the advanced 
IRB approach (AIRB) may calculate all the four parameters (PD, EAD, LGD and M) using 
their own internal models and not only the probability of default. The mechanism of the 
IRB approach is then the following: 


e a classification of exposures (sovereigns, banks, corporates, retail portfolios, etc.); 
e for each credit i, the bank estimates the probability of default PD;; 


e it uses the standard regulatory values of the other risk components (EAD;, LGD; and 
M;) or estimates them in the case of AIRB; 


e the bank calculate then the risk-weighted assets RWA; of the credit by applying the 
right IRB formula ftrp to the risk components. 


Internal ratings are central to the IRB approach. Table 3.23 gives an example of an internal 
rating system, where risk increases with the number grade (1, 2, 3, etc.). Another approach 
is to consider alphabetical letter grades”. A third approach is to use an internal rating 
scale similar to that of S&P°°. 


3.2.3.2 The credit risk model of Basel II 


Decomposing the value-at-risk into risk contributions BCBS (2004a) used the 
Merton-Vasicek model (Merton, 1974; Vasicek, 2002) to derive the IRB formula. In this 
framework, the portfolio loss is equal to: 


L= > wi: LGD; -1 {7 < T;} (3.19) 
i=1 
where w; and T; are the exposure at default and the residual maturity of the i*® credit. 
We assume that the loss given default LGD; is a random variable and the default time 
7; depends on a set of risk factors X, whose probability distribution is denoted by H. Let 
pi (X) be the conditional default probability. It follows that the (unconditional or long-term) 
default probability is: 


pi = Ex (i {r < Ti} 
= Ex [pi (X)] 


We also introduce the notation D; = 1 {t; < Ti}, which is the default indicator function. 
Conditionally to the risk factors X, D; is a Bernoulli random variable with probability 
pi (X). If we consider the standard assumption that the loss given default is independent 


55For instance, the rating system of Crédit Agricole is: A+, A, B+, B, C+, C, C-, D+, D, D-, E+, E and 
E- (source: Credit Agricole, Annual Financial Report 2014, page 201). 

56This is the case of JPMorgan Chase & Co. (source: JPMorgan Chase & Co., Annual Report 2014, page 
104). 
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TABLE 3.23: An example of internal rating system 


Degre Borrower 
Rating : Definition category by 
of risk 
self-assessment 
1 No essential Extremely high degree of certainty of 
risk repayment 
Negligible i ; 
2 k High degree of certainty of repayment 
3 Some risk Sufficient certainty of repayment 
A Better There is certainty of repayment but 
4 B than substantial changes in the 
c averagė environment in the future may have 
some impact on this uncertainty Normal 
A There are no problems foreseeable in 
the future, but a strong likelihood of 
5 B Average . j 
c impact from changes in the 
environment 
À There are no problems foreseeable in 
6 B Tolerable the future, but the future cannot be 
C considered entirely safe 
Lower There are no problems at the current 
7 than time but the financial position of the 
average borrower is relatively weak 
There are problems with lending 
A Needs terms or fulfilment, or the borrower’s Needs 
8 preventive business conditions are poor or ; 
B management unstable, or there are other factors attention 
requiring careful management 
9 There is a high likelihood of In danger 
Needs bankruptcy in the future of bankruptcy 
I serious The borrower is in serious financial Effectively 
10 management straits and “effectively bankrupt” bankruptcy 
II | The borrower is bankrupt — -Bankrupt 


Source: Ieda et al. (2000). 


from the default time and we also 
dent”, we obtain: 


[L | X] 


and8: 


o? (L| X) = 


a [L? | 


2. (E [LaD}]; 


assume that the default times are 


= ee z [LGD] - E [D; | X] 
= wi 5 [LGD] - pi (X) 
X] - E?’ [L | X] 


z [D? | X] - 


0? [LGD,] - p; (X)) 


conditionally indepen- 


(3.20) 


57The default times are not independent, because they depend on the common risk factors X. However, 
conditionally to these factors, they become independent because idiosyncratic risk factors are not correlated. 
58Because the conditional covariance between D; and Dj is equal to zero. The derivation of this formula 


is given in Exercise 3.4.8 on page 255. 
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We have E [D? | X] =p; (X) and E[LGD?] = o? (LGD,) + E? [LGD,]. We deduce that: 


o? (L| X)= Sow? A; (3.21) 
i=1 


where: 


A; = E? [LGD] - pi (X) - (1 — p: (X)) + 0? (LGD) - pi (X) 


BCBS (2004a) assumes that the portfolio is infinitely fine-grained, which means that there 
is no concentration risk: ae 
lim max —,— = 0 (3.22) 
Tee OO X z= Wj 
In this case, Gordy (2003) shows that the conditional distribution of L degenerates to its 
conditional expectation E [L | X]. The intuition of this result is given by Wilde (2001a). 
He considers a fine-grained portfolio equivalent to the original portfolio by replacing the 
original credit i by m credits with the same default probability p;, the same loss given 
default LGD; but an exposure at default divided by m. Let Lm be the loss of the equivalent 
fine-grained portfolio. We have: 


[Em X] = 3 5O | -E [LGD;] -E [D; | X] 


= owi: D [LGD;] - p:i (X) 


t=1 
= E[L|X 
and: 
3 n m w? 
a (Lm|X) = X doa] Ai 
i=1 \j=l1 
1 n 
= ua 
m< 
tel 
1 2 
= ee (El X) 


When m tends to oo, we obtain the infinitely fine-grained portfolio. We note that 
z [Læ | X] = EJL | X] and o? (Læ | X) = 0. Conditionally to the risk factors X, the 
portfolio loss Læ is equal to the conditional mean E[L | X]. The associated probability 
distribution F is then: 


II 


F(t) = Pr{Le <Q 


= Pr{E[L| X] < 4 
= Pef Som z [LGD;] - p: (X) < 7 


Let g(x) be the function )>""_, w; E [LGD;] - p; (x). We have: 


F(Q= f [ 1{9(@) <9 He 
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However, it is not possible to obtain a closed-form formula for the value-at-risk F7! (a) 
defined as follows: 
F(a) = {€: Pr {g (X) < 4} =a} 


If we consider a single risk factor and assume that g (x) is an increasing function, we obtain: 


Pr{g(X)< Q =a & Prix ag" Ota 
s H(g'(0) =a 
& = 9(H7'(a)) 


We finally deduce that the value-at-risk has the following expression: 
F(a) = g(H-*(a)) 


Equation (3.23) is appealing because the value-at-risk satisfies the Euler allocation principle. 
Indeed, we have: 


E 7 OF"! (a) 
RC; = Wi pwo; 
= w;:E[LGDj]- pi (H7? (a)) (3.24) 


and: 


SRG = F! (a) 
2t 


Remark 30 If g(x) is a decreasing function, we obtain Pr {X > g7! (4) } =a and: 


F~! (a) = > wi -E[LGD,] - p: (H7' (1 — a) 


The risk contribution becomes: 


RC; = Wj: Y [LGD,] ‘Di (H! (1 = a)) (3.25) 


We reiterate that Equation (3.24) has been obtained under the following assumptions: 
Hı the loss given default LGD; is independent from the default time 7;; 


Hə the default times (71,...,7) depend on a single risk factor X and are conditionally 
independent with respect to X; 


H3 the portfolio is infinitely fine-grained, meaning that there is no exposure concentration. 


Equation (3.24) is a very important result for two main reasons. First, it implies that, 
under the previous assumptions, the value-at-risk of an infinitely fine-grained portfolio can 
be decomposed as a sum of independent risk contributions. Indeed, RC; depends solely on 
the characteristics of the i*® credit (exposure at default, loss given default and probability 
of default). This facilitates the calculation of the value-at-risk for large portfolios. Second, 
the risk contribution RC; is related to the expected value of the loss given default. We don’t 
need to model the probability distribution of LGD;, only the mean E [LGD,] is taken into 
account. 
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Closed-form formula of the value-at-risk In order to obtain a closed-form formula, 
we need a model of default times. BCBS (2004a) has selected the one-factor model of Merton 
(1974), which has been formalized by Vasicek (1991). Let Z; be the normalized asset value 
of the entity 7. In the Merton model, the default occurs when Z; is below a given barrier 
Bi: 

D,=184,< Bi 


By assuming that Z; is Gaussian, we deduce that: 


= Pr{Z; < Bi} 
® (Bi) 


The value of the barrier B; is then equal to ®~! (p;). We assume that the asset value Z; 
depends on the common risk factor X and an idiosyncratic risk factor £; as follows: 


Fy = fpX + Jl — pe; 
X and g; are two independent standard normal random variables. We note that®?: 
t[Z:Z;] = E[(VaX + V1— pei) (vax + vT= ø; )] 
= E[pX?+(1- p) eie; + XVP- A (e: + 3)] 


= p 


where p is the constant asset correlation. We now calculate the conditional default proba- 
bility: 
= Pr{Z; < Bi| X} 


= Pr { pX + yT- pei < Bi} 


Jip 
~ (28) 


g(x) = 2 vi: | [LGD]] - pi (£) 
_< wE ya [P (Pi) - vpr 
= >, i- E [LGD;] (ow ) 


We note that g(x) is a decreasing function if w; > 0. Using Equation (3.25) and the 

relationship ®~! (1 — a) = —®~! (a), it follows that: 

B~* (pi) + yP! 2) 
vl=p 


59We have E [e;¢;] = 0 because £; and £; are two specific risk factors. 


(3.26) 
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Remark 31 We verify that pi is the unconditional default probability. Indeed, we have: 


We recognize the integral function analyzed in Appendix A.2.2.5 on page 1063. We deduce 


that: 


= (00, 871 (pi); VP) 
= (67'(p,)) 
= Pi 


Example 29 We consider a homogeneous portfolio with 100 credits. For each credit, the 
exposure at default, the expected LGD and the probability of default are set to $1 mn, 50% 
and 5%. 


Let us assume that the asset correlation p is equal to 10%. We have reported the nu- 
merical values of F~! (a) for different values of a in Table 3.24. If we are interested in the 
cumulative distribution function, F (£) is equal to the numerical solution a of the equation 
F~! (a) = £. Using a bisection algorithm, we find the probabilities given in Table 3.24. For 
instance, the probability to have a loss less than or equal to $3 mn is equal to 70.44%. Fi- 
nally, to calculate the probability density function of the portfolio loss, we use the following 
relationship”: r 


fe) = EEFE) 


where: 
p . 1 . 
1—p (®t (a)) 
B- (pi) + yp (a) 
i ( vi=p ) 


In Figure 3.21, we compare the probability functions for two different values of the asset 
correlation p. We note that the level of p has a big impact on the quantile function and the 
shape of the density function. 


ða F(a) = X- wi E[LGD]].- 
w=1 


TABLE 3.24: Numerical values of f (£), F (£) and F~! (a) when p is equal to 10% 


Z (m$mn) 0.10 1.00 2.00 300 400 5.00 
F()  (n%) 0.03 16.86 47.98 70.44 83.80 91.26 
HO (in %) 1.04 31.19 27.74 1739 9.90 5.43 
a (m%) 10.00 25.00 50.00 75.00 90.00 95.00 
F-l(a) (in$mn) 0.77 1.25 2.07 3.28 4.78 5.90 


60See Appendix A.2.2.3 on page 1062. 
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FIGURE 3.21: Probability functions of the credit portfolio loss 


The risk contribution RC; depends on three credit parameters (the exposure at default 
wi, the expected loss given default E[LGD,] and the probability of default p;) and two 
model parameters (the asset correlation p and the confidence level a of the value-at-risk). It 
is obvious that RC; is an increasing function of the different parameters with the exception 


of the correlation. We obtain: 


ARC; 1 ®-1 (a) 
sign t = sign ————_. | 1 (p; +29) 
-gp > e (Ps) VP 


We deduce that the risk contribution is not a monotone function with respect to p. It 
increases if the term \/p®~' (p;)+®~! (a) is positive. This implies that the risk contribution 
may decrease if the probability of default is very low and the confidence level is larger than 
50%. The two limiting cases are p = 0 and p = 1. In the first case, the risk contribution is 


equal to the expected loss: 


In the second case, the risk contribution depends on the value of the probability of default: 


0 ifp;j<l-a 
lim RC; = | 0.5-w;-E[LGD,] ifp;=l—a 
ie Wi * i [LGD,] if Pi > l-a 


The behavior of the risk contribution is illustrated in Figure 3.22 with the following base 
parameter values: w; = 100, E[LGD,] = 70%, p = 20% and a = 90%. We verify that the 
risk contribution is an increasing function of E[LGD,] (top/left panel) and a (top/right 
panel). When p; and a are set to 10% and 90%, the risk contribution increases with p and 
reaches the value 35, which corresponds to half of the nominal loss given default. When 
pi and a are set to 5% and 90%, the risk contribution increases in a first time and then 
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decreases (bottom/left panel). The maximum is reached for the value®! p* = 60.70%. When 
a is equal to 99%, this behavior vanishes (bottom/right panel). 
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FIGURE 3.22: Relationship between the risk contribution RC; and model parameters 


In this model, the maturity T; is taken into account through the probability of default. 
Indeed, we have p; = Pr {t; < T;}. Let us denote PD; the annual default probability of the 
obligor. If we assume that the default time is Markovian, we have the following relationship: 


pi = 1—Pr{t;>T;} 
= 1-(1-PD;)” 


We can then rewrite Equation (3.26) such that the risk contribution depends on the exposure 
at default, the expected loss given default, the annualized probability of default and the 
maturity, which are the 4 parameters of the IRB approach. 


3.2.3.3 The IRB formulas 


A long process to obtain the finalized formulas The IRB formula of the second 
consultative portfolio was calibrated with a = 99.5%, p = 20% and a standard maturity 
of three years. To measure the impact of this approach, the Basel Committee conducted a 
quantitative impact study (QIS) in April 2001. A QIS is an Excel workbook to be filled by the 
bank. It allows the Basel Committee to gauge the impact of the different proposals for capital 
requirements. The answers are then gathered and analyzed at the industry level. Results 
were published in November 2001. Overall, 138 banks from 25 countries participated in the 
QIS. Not all participating banks managed to calculate the capital requirements under the 


61We have: 


p* = max? | 0 oie) ea) = 60.70% 
* 6-1 (p;) 1.645 
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three methods (SA, FIRB and AIRB). However, 127 banks provided complete information 
on the SA approach and 55 banks on the FIRB approach. Only 22 banks were able to 
calculate the AIRB approach for all portfolios. 


TABLE 3.25: Percentage change in capital requirements under CP2 proposals 


SA FIRB AIRB 
Group1 6% 14% -5% 


Group 2 1% 

J EU Group1 6% 10% —1% 
Group 2 —-1% 

“Others = tt—~S 5% 


Source: BCBS (2001b). 


In Table 3.25, we report the difference in capital requirements between CP2 proposals 
and Basel I. Group 1 corresponds to diversified, internationally active banks with tier 1 
capital of at least €3 bn whereas Group 2 consists of smaller or more specialized banks. 
BCBS (2001b) concluded that “on average, the QIS2 results indicate that the CP2 proposals 
for credit risk would deliver an increase in capital requirements for all groups under both 
the SA and FIRB approaches”. It was obvious that these figures were not satisfactory. The 
Basel Committee considered then several modifications in order to (1) maintain equivalence 
on average between current required capital and the revised SA approach and (2) provide 
incentives under the FIRB approach. A third motivation has emerged rapidly. According 
to many studies®*, Basel II may considerably increase the procyclicality of capital require- 
ments. Indeed, capital requirements may increase in an economic meltdown, because LGD 
increases in bad times and credits receive lower ratings. In this case, capital requirements 
may move in an opposite direction than the macroeconomic cycle, leading banks to reduce 
their supply of credit during a crisis. In this scenario, Basel II proposals may amplify credit 
crises and economic downturns. All these reasons explain the long period to finalize the 
Basel II Accord. After two new QIS (QIS 2.5 in July 2002 and QIS 3 in May 2003) and 
a troubled period at the end of 2003, the new Capital Accord is finally published in June 
2004. However, there was a shared feeling that it was more a compromise than a terminated 
task. Thus, several issues remained unresolved and two new QIS will be conducted in 2004 
and 2005 before the implementation in order to confirm the calibration. 


The supervisory formula If we use the notations of the Basel Committee, the risk 
contribution has the following expression: 


RC = EAD-LGD-® 


where EAD is the exposure at default, LGD is the (expected) loss given default, PD is the 
(one-year) probability of default and M is the effective maturity. Because RC is directly the 
capital requirement (RC = 8% x RWA), we deduce that the risk-weighted asset amount is 
equal to: 

RWA = 12.50: EAD -k* (3.27) 


62See for instance Goodhart et al. (2004) or Kashyap and Stein (2004). 
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where K* is the normalized required capital for a unit exposure: 


oo! (1 ye PD)™) + Vp®- (a) 
Lp 


K* = LGD -® 


(3.28) 


In order to obtain the finalized formulas, the Basel Committee has introduced the following 
modifications: 


e a maturity adjustment y (M) has been added in order to separate the impact of the 
one-year probability of default and the effect of the maturity; the function ọ (M) has 
then been calibrated such that Expression (3.28) becomes: 


6-1 (PD) + Vp"! (a) 
vI=p 


e it has used a confidence level of 99.9% instead of the 99.5% value; 


K* ~ LGD -ð ( ) -p (M) (3.29) 


e it has defined a parametric function p (PD) for the default correlation in order that 
low ratings are not too penalizing for capital requirements; 


e it has considered the unexpected loss as the credit risk measure: 


UL = VaRa —E [L] 


In summary, the risk-weighted asset amount in the IRB approach is calculated using Equa- 
tion (3.27) and the following normalized required capital: 


K* = (uap F (= ED a AD A 
1— p (PD) 


) LaD-PD) -p00 (3.30) 


Risk-weighted assets for corporate, sovereign, and bank exposures The three 
asset classes use the same formula: 


ib -1 
ae (rev 7 (: (PD) + /p(PD)® 2%) Lap- PD) 
1 — p (PD) 
1+(M- 2.5)-b(PD) 
( 1—1.5-b(PD) (9.31) 
with b (PD) = (0.11852 — 0.05478 - In (PD))? and: 
— e750xPD _ p—50xPD 


We note that the maturity adjustment p (M) vanishes when the effective maturity is one 
year. For a defaulted exposure, we have: 


K* = max (0, LGD — EL) 


where EL is the bank’s best estimate of the expected loss®*. 


63We can assimilate it to specific provisions. 
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For small and medium-sized enterprises, a firm-size adjustment is introduced by defin- 
ing a new parametric function for the default correlation: 


S,5)—5 
pME (PD) = p (PD) — 0.04 - (1 = ) ) 
where S is the reported sales expressed in € mn. This adjustment has the effect to reduce 
the default correlation and then the risk-weighted assets. Similarly, the Basel Committee 
proposes specific arrangements for specialized lending and high-volatility commercial real 
estate (HVCRE). 


In the foundation IRB approach, the bank estimates the probability of default, but uses 
standard values for the other parameters. In the advanced IRB approach, the bank always 
estimates the parameters PD and M, and may use its own estimates for the parameters 
EAD and LGD subject to certain minimum requirements. The risk components are defined 
as follows: 


1. The exposure at default is the amount of the claim, without taking into account 
specific provisions or partial write-offs. For off-balance sheet positions, the bank uses 
similar credit conversion factors for the FIRB approach as for the SA approach. In 
the AIRB approach, the bank may use its own internal measures of CCF. 


2. In the FIRB approach, the loss given default is set to 45% for senior claims and 
75% for subordinated claims. In the AIRB approach, the bank may use its own es- 
timates of LGD. However, they must be conservative and take into account adverse 
economic conditions. Moreover, they must include all recovery costs (litigation cost, 
administrative cost, etc.). 


3. PD is the one-year probability of default calculated with the internal rating system. 
For corporate and bank exposures, a floor of 3 bps is applied. 


4. The maturity is set to 2.5 years in the FIRB approach. In the advanced approach, M 
is the weighted average time of the cash flows, with a one-year floor and a five-year 
cap. 


Example 30 We consider a senior debt of $3 mn on a corporate firm. The residual maturity 
of the debt is equal to 2 years. We estimate the one-year probability of default at 5%. 


To determine the capital charge, we first calculate the default correlation: 


1 — e750x0.05 1 — e750x0.05 
p(PD) = 129% x ( e ) +21% x 1- <2) 


— e—50 — e—50 
l—e l-—e 


= 12.985% 
We have: 


b(PD) = (0.11852 — 0.05478 x In (0.05))? 
0.0799 


It follows that the maturity adjustment is equal to: 


1+ (2— 2.5) x 0.0799 
PM = 1— 1.5 x 0.0799 
= 1.0908 


64They are defined as corporate entities where the reported sales for the consolidated group of which the 
firm is a part is less than € 50 mn. 
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The normalized capital charge with a one-year maturity is: 


D-t (5%) + V12.985%0-! 2%) 


45% x 5% 
y1 — 12.985% 


k* 


15% 8 ( 


= 0.1055 
When the maturity is two years, we obtain: 


K* = 0.1055 x 1.0908 
0.1151 


We deduce the value taken by the risk weight: 


RW = 12.5 x 0.1151 
143.87% 


It follows that the risk-weighted asset amount is equal to $4.316 mn whereas the capital 
charge is $345 287. Using the same process, we have calculated the risk weight for different 
values of PD, LGD and M in Table 3.26. The last two columns are for a SME claim by 
considering that sales are equal to €5 mn. 


TABLE 3.26: IRB risk weights (in %) for corporate exposures 


Maturity M=1 M=25 |M=2.5 (SME) 
LGD 45% 75% | 45% 75% | 45% 75% 
0.10 | 18.7 311] 29.7 494] 233 388 

0.50 | 522 86.9] 69.6 116.0] 549 91.5 

1.00 | 73.3 122.1] 92.3 153.9] 72.4 120.7 

PD (in%) 2.00] 95.8 159.6 | 114.9 191.4] 885 147.6 
5.00 | 131.9 219.8] 149.9 249.8] 112.3 187.1 

10.00 | 175.8 292.9 | 193.1 321.8 | 146.5 244.2 

20.00 | 223.0 371.6 | 238.2 397.1] 188.4 314.0 


Risk-weighted assets for retail exposures Claims can be included in the regulatory 
retail portfolio if they meet certain criteria: in particular, the exposure must be to an 
individual person or to a small business; it satisfies the granularity criterion, meaning that 
no aggregate exposure to one counterpart can exceed 0.2% of the overall regulatory retail 
portfolio; the aggregated exposure to one counterparty cannot exceed € 1 mn. In these cases, 
the bank uses the following IRB formula: 


k* = LGD -© 


LGD- PD (3.33) 


®-! (PD) + vp (PD)! (99.9%) 
1 — p (PD) 


We note that this IRB formula correspond to a one-year fixed maturity. The value of the 
default correlation depends on the categories. For residential mortgage exposures, we have 
p (PD) = 15% whereas the default correlation p (PD) is equal to 4% for qualifying revolving 
retail exposures. For other retail exposures, it is defined as follows: 


1— e735xPD 1— e7 35x PD 


In Table 3.27, we report the corresponding risk weights for the three categories and for two 
different values of LGD. 
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TABLE 3.27: IRB risk weights (in %) for retail exposures 


Mortgage Revolving Other retail 
LGD 45% 25% | 45% 85% | 45% 85% 

0.10 | 10.7 5.9 2 5.1) 11.2 21.1 

0.50 | 35.1 195) 10.0 19.0 | 32.4 6l.1 

1.00 | 56.4 31.3 | 17.2 32.5) 45.8 86.5 

PD (in %) 2.00 | 87.9 48.9] 28.9 546] 58.0 109.5 
5.00 | 148.2 82.3 | 54.7 103.4 | 66.4 125.5 

10.00 | 204.4 113.6 | 83.9 158.5 | 75.5 142.7 

20.00 | 253.1 140.6 | 118.0 222.9 | 100.3 189.4 


The other two pillars The first pillar of Basel II, which concerns minimum capital re- 
quirements, is completed by two new pillars. The second pillar is the supervisory review 
process (SRP) and is composed of two main processes: the supervisory review and evaluation 
process (SREP) and the internal capital adequacy assessment process (ICAAP). The SREP 
defines the regulatory response to the first pillar, in particular the validation processes of 
internal models. Nevertheless, the SREP is not limited to capital requirements. More gen- 
erally, the SREP evaluates the global strategy and resilience of the bank. ICAAP addresses 
risks that are not captured in Pillar 1 like concentration risk or non-granular portfolios 
in the case of credit risk®°°. For instance, stress tests are part of Pillar 2. The goal of the 
second pillar is then to encourage banks to continuously improve their internal models and 
processes for assessing the adequacy of their capital and to ensure that supervisors have 
the adequate tools to control them. The third pillar, which is also called market discipline, 
requires banks to publish comprehensive information about their risk management process. 
This is particularly true since the publication in January 2015 of the revised Pillar 3 dis- 
closure requirements. Indeed, BCBS (2015a) imposes the use of templates for quantitative 
disclosure with a fixed format in order to facilitate the comparison between banks. 


3.2.4 The Basel III revision 


For credit risk capital requirements, Basel IT] is close to the Basel II framework with some 
adjustments, which mainly concern the parameters°®. Indeed, the SA and IRB methods 
continue to be the two approaches for computing the capital charge for credit risk. 


3.2.4.1 The standardized approach 


Risk-weighted exposures External credit ratings continue to be the backbone of the 
standardized approach in Basel II. Nevertheless, they are not the only tool for measuring 
the absolute riskiness of debtors and loans. First, the Basel Committee recognizes that 
external credit ratings are prohibited in some jurisdictions for computing regulatory capital. 
For example, this is the case of the United States, which had abandoned in 2010 the use of 
commercial credit ratings after the Dodd-Frank reform. Second, the Basel Committee links 
risk weights to the loan-to-value ratio (LTV) for some categories. 

When external ratings are allowed®’, the Basel Committee defines a new table of risk 
weights, which is close to the Basel II table. In Table 3.28, we indicate the main cate- 
gories and the risk weights associated to credit ratings. We notice that the risk weights for 


65Since Basel III, ICAAP is completed by the internal liquidity adequacy assessment process (ILAAP). 
66 The Basel III framework for credit risk is described in BCBS (2017c). 
67This method is called the external credit risk assessment approach (ECRA). 
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TABLE 3.28: Risk weights of the SA approach (ECRA, Basel III) 


AAA A+ BBB+ BB+ CCC+ 
Rating to to to to to NR 
AA— A- BBB B C 
Sovereigns 0% 20% 50% 100% 150% 100% 
POR 7 1 20% 50% 100% 100% 150% 100% — 
2 20% 50% 50% 100% 150% 50% 
“MDB = ~~ 20% 30% 50% 100% 150% 50% — 
SD 20% 30% 50% 100% 150% SCRA 
Banks 2ST 20% 20% 20% 50% 150% SCRA 
Covered 10% 20% 20% 50% 100% œ% 
` Corporates ~~ 20% 50% 75% 100% 150% 100% — 
“Retail oO o 1: \, 


©) For unrated covered bonds, the risk weight is generally half of the risk weight of the issuing 
bank. 


sovereign exposures and non-central government public sector entities (PSE) are unchanged. 
The risk weights for multilateral development banks (MDB) continue to be related to the 
risk weights for banks. However, we notice that the first option is removed and we observe 
some differences for exposures to banks. First, the risk weight for the category A+/A— 
is reduced from 50% to 30%. Second, for unrated exposures, the standard figure of 50% 
is replaced by the standardized credit risk approach (SCRA). Third, the Basel Commit- 
tee considers the special category of covered bonds, whose development has emerged after 
the 2008 Global Financial Crisis and the introduction of capital requirements for systemic 
risks®*. For exposures to corporates, the Basel Committee uses the same scale than for 
other categories contrary to Basel II (see Table 3.16 on page 163). Finally, the risk weight 
for retail exposures remains unchanged. 


The standardized credit risk approach (SCRA) must be used for all exposures to banks 
in two situations: (1) when the exposure is unrated; (2) when external credit ratings are 
prohibited. In this case, the bank must conduct a due diligence analysis in order to classify 
the exposures into three grades: A, B, and C. Grade A refers to the most solid banks, whose 
capital exceeds the minimum regulatory capital requirements, whereas Grade C refers to 
the most vulnerable banks. The risk weight is respectively equal to 40%, 75% and 150% 
(20%, 50% and 150% for short-term exposures). 


When external credit ratings are prohibited, the risk weight of exposures to corporates is 
equal to 100% with two exceptions. A 65% risk weight is assigned to corporates, which can 
be considered investment grade (IG). For exposures to small and medium-sized enterprises, 
a 75% risk weight can be applied if the exposure can be classified in the retail category and 
85% for the others. 


The case of retail is particular because we have to distinguish real estate exposures 
and other retail exposures. By default, the risk weight is equal to 75% for this last cat- 
egory, which includes revolving credits, credit cards, consumer credit loans, auto loans, 
student loans, etc. For real estate exposures, the risk weights depend on the loan-to-value 
ratio (LTV). Suppose that someone borrows $100 000 to purchase a house of $150 000, the 
LTV ratio is 100 000/150 000 or 66.67%. This ratio is extensively used in English-speaking 


68 See Chapter 8 on page 453. 
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countries (e.g. the United States) to measure the risk of the loan. The idea is that the 
lender’s haircut ($100000 in our example) represents the lender risk. If the borrower de- 
faults, the lender recovers the property, that will be sold. The risk is then to sell the property 
below the lender’s haircut. The higher the LTV ratio, the riskier the loan is for the lender. 
In continental Europe, the risk of home property loans is measured by the ability of the 
borrower to repay the capital and service his debt. In this case, the risk of the loan is 
generally related to the income of the borrower. It is obvious that these two methods for 
assessing the credit risk are completely different and this explains the stress in Europe to 
adopt the LTV approach. In Table 3.29, we have reported the value of risk weights with 
respect to the LTV (expressed in %) in the case of residential real estate exposures. The 
Basel Committee considers two categories depending if the repayment depends on the cash 
flows generated by property (D) or not (ND). The risk weight ranges from 20% to 105% in 
Basel III, whereas it was equal to 35% in Basel II. 


TABLE 3.29: Risk weights of the SA approach (ECRA, Basel III) 


Residential real estate Commercial real estate 
Cash flows ND D Cash flows ND D 
z - 
LTV < 50 20% 30% LTV < 60 min (60%, 70% 


50<LTV <60 25% 35% RWc) 

| 60 <LTV <80 30% 45% |60<LTV <80  RWc 90% 

| 80 <LITV <90 40% 60%) © 

90 <LTV <100 50% 75% LTV > 80 RWo 110% 
LTV > 100 70% 105% 


The LTV ratio is also used to determine the risk weight of commercial real estate, 
land acquisition, development and construction exposures. Table 3.29 gives the risk weight 
for commercial real estate exposures. If the repayment does not depend on the cash flows 
generated by property (ND), we use the risk weight of the counterparty with a cap of 
60%. If the repayment depends on the cash flows generated by the property (D), the risk 
weight ranges from 70% to 110%, whereas it was equal to 100% in Basel II. Commercial real 
estate exposures that do not meet specific qualitative requirements will be risk-weighted at 
150%, which is also the default figure for land acquisition, development and construction 
exposures. 

For off-balance sheet items, credit conversion factors (CCF) have been revised. They 
can take the values 10%, 20%, 40%, 50% and 100%. This is a more granular scale without 
the possibility to set the CCF to 0%. Generally speaking, the CCF values in Basel III are 
more conservative than in Basel II. 


Credit risk mitigation The regulatory framework for credit risk mitigation techniques 
changes very little from Basel II to Basel III: the two methods remain the simple and 
comprehensive approaches; the treatment of maturity mismatches is the same; the formulas 
for computing the risk weighted assets are identical, etc. Minor differences concern the 
description of eligible financial collateral and the haircut parameters, which are given in 
Table 3.30. For instance, we see that the Basel Committee makes the distinction of issuers 
for debt securities between sovereigns, other issuers and securitization exposures. While the 
haircuts do not change for sovereign debt securities with respect to Basel II, the scale is 
more granular for the two other categories. Haircuts are also increased by 5% for gold and 
equity collateral instruments. 

The major difference concerns the treatment of securities financing transactions (SFT) 
such as repo-style transactions, since the Basel Committee has developed a specific approach 
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TABLE 3.30: Standardized supervisory haircuts for collateralized transactions (Basel ITI) 


Rating aes Sovereigns Others py 
0-1Y 0.5% 1% 2% 
1-3Y 2% 3% 8% 
AAA to AA— 3—5Y 2% 4% 8% 
5Y—10Y 4% 6% 16% 
10Y+ 4% 12% 16% 
ee O-1Y ~~ 1% 2g A% 
1-—3Y 3% 4% 12% 
A+ to BBB— 3—5Y 3% 6% 12% 
5Y—10Y 6% 12% 24% 
10Y+ 6% 20% 24% 
-BB+ to BB O 5A 
Cash 0% 
Gold 20% 
Main index equities 20% 
Equities listed on a recognized exchange 30% 
FX risk 8% 


for calculating the modified exposure EAD* of these instruments in the comprehensive 
approach (BCBS, 2017c, pages 43-47). 
3.2.4.2 The internal ratings-based approach 


The methodology of the IRB approach does not change with respect to Basel II, since 
the formulas are the same®’. The only exception is the correlation parameter for bank 
exposures’, which becomes: 


1 — e-50xPD 1 — (1 — e750xPD 
p(PD) = 1.25 x (2x x (—) roy pal ce: 


1 — e750 1 — e750 
j= —50x PD ae = e7 50x PD 
= 15% x (—) + 30% x ( C —a (3.35) 


Therefore, the correlation range for the bank category increases from 12% — 24% to 15% — 
30%. In fact, the main differences concern the computation of the LGD parameter, and 
the validation of the IRB approach, which is much more restrictive. For instance, the IRB 
approaches are not permitted for exposures to equities, and we cannot develop an AIRB 
approach for exposures to banks and exposures to corporates with annual revenues greater 
than €500 mn. For banks and large corporates, only the FIRB approach is available. 


The Basel Committee still considers five asset classes: corporates, sovereigns, banks, 
retail and equities. In the FIRB approach, the bank estimates the PD parameter, while 


69This concerns Equation (3.27) for risk-weighted assets, Equations (3.31) and (3.32) for corporate, 
sovereign, and bank exposures, Equations (3.33) and (3.34) for retail exposures, the maturity adjustment 
b(PD), the correlation formula pSMF (PD) for SME exposures, the correlation parameters for retail expo- 
sures, etc. 

The multiplier of 1.25 is applied for regulated financial institutions with a total asset larger than $100 
bn and all unregulated financial institutions. 
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it uses the regulatory estimates of EAD, LGD and M”. In the AIRB approach, the bank 
estimates all the parameters, but they are subject to some input floors. For example, the 
minimum PD is set to 5 bps for corporate and bank exposures. 

Certainly, LGD is the most challenging parameter in Basel III. In the FIRB approach, 
the default values are 75% for subordinated claims, 45% for senior claims on financial 
institutions and 40% for senior claims on corporates. When considering a collateral, the 
LGD parameter becomes: 


LGD, = w-LGD+(1—w)-LGDe 


where LGD and LGD¢ apply to the unsecured exposure and the collateralized part, and w 
is the relative weight between LGD and LGDc: 


d=fejet 
(1+ Hp). EAD 


Here, Hp is the SA haircut for the exposure, C is the value of the collateral, and Ho is 
the specific haircut for the collateral. LGDo is equal to 0% for financial collateral, 20% 
for receivables and real estate and 25% for other physical collateral, whereas Hc can be 
from 0% to 100%. In the AIRB approach, the LGD parameter may be estimated by the 
bank, under the constraint that it is greater than the input floor LGD*!°°". For unsecured 
exposures, we have LGD > LGD*!°° where LGD?! = 25%. For secured exposures, we 
have LGD, > LGDE! 0r where: 


LGD =4 LGpFleer 4 (a 2 w) : LGD? 


LGD"! = 25% and LGDG@°™ depends on the collateral type: 0% for financial collateral, 
10% for receivables and real estate and 15% for other physical collateral. 


Remark 32 Since the capital requirement is based on the unexpected loss, the Basel Com- 
mittee imposes that the expected loss is deduced from regulatory capital. 


3.2.5 The securitization framework 


Capital calculations for securitization require developing a more complex approach than 
the IRB approach, because the bank is not directly exposed to the loss of the credit portfolio, 
but to the conditional loss of the credit portfolio. This is particularly true if we consider 
a CDO tranche since we cannot measure the risk of equity, mezzanine and senior tranches 
in the same way. In what follows, we do not study the Basel II framework, which was very 
complex, but presented many weaknesses during the 2008 Global Financial Crisis. We prefer 
to focus on the Basel III framework (BCBS, 2016e), which is implemented since January 
2018. 


3.2.5.1 Overview of the approaches 
The securitization framework consists of three approaches: 
1. Securitization internal ratings-based approach (SEC-IRBA) 
2. Securitization external ratings-based approach (SEC-ERBA) 


3. Securitization standardized approach (SEC-SA) 


“1 We recall that M is set to 2.5 years for all exposures, except for repo-style and retail exposures where 
the maturity is set to 6 and 12 months. 
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Contrary to credit risk, the hierarchy is reversed. The SEC-IRBA must be first used and is 
based on the capital charge Xırs of the underlying exposures. If the bank cannot calculate 
Kirp for a given securitization exposure, because it has not access to the collateral pool of 
the debt”, it has to use the SEC-ERBA. If the tranche is unrated or if external ratings are 
not allowed, the bank must finally use the SEC-SA. When it is not possible to use one of 
the three approaches, the risk weight of the securitization exposure is set to 1 250%. 


This framework has been developed for three types of exposures: STC securitization, 
non-STC securitization and resecuritization. STC stands for simple, transparent and com- 
parable securitizations. In July 2015, the BCBS and the Board of IOSCO have published a 
set of 14 criteria for identifying STC exposures. These criteria are related to the collateral 
pool (asset risk), the transparency (structural risk) and the governance (fiduciary and ser- 
vicer risk) of the SPV. Examples of criteria are the nature of the assets, the payment status, 
alignment of interests, transparency to investors, etc. Resecuritization implies that some un- 
derlying assets are themselves securitization exposures. For example, a CDO-squared is a 
resecuritization, because the asset pool is a basket of CDO tranches. 


3.2.5.2 Internal ratings-based approach (SEC-IRBA) 


In order to implement SEC-IRBA, the bank must conduct a strict due diligence of the 
pay-through securitization exposure in order to have a comprehensive information of the 
underlying exposures. For each asset that composes the collateral pool, it calculates the 
capital charge. Then, the bank determines Kypp as the ratio between the sum of individual 
capital charges and the exposure amount of the collateral pool. If the bank has not all the 
information, it can use the following formula: 


King =w: Kins + (1 — w) -Ksa 


where Kipp is the IRB capital requirement for the IRB pool”, Kga is the SA capital 
requirement for the underlying exposures and w is the percentage of the IRB pool. However, 
this formula is only valid if w > 95%. Otherwise, the bank must use the SEC-SA. 

We consider a tranche, where A is the attachment point and D is the detachment point. 
If Kirg > D, the Basel Committee considers that the risk is very high and RW is set to 
1250%. Otherwise, we have: 


_ max (Kirp, A) —A 
RW = 12.5 ( D_A 
D — max (Kırg, A) 
12.5. 
aa 


) - Kssra (Kipp) (3.36) 


where Kggra (Kipp) is the capital charge for one unit of securitization exposure’. There- 
fore, we obtain two cases. If A < Kirg < D, we replace max (Kırg, A) by Krrg in the 
previous formula. It follows that the capital charge between the attachment point A and 
Kipp is risk-weighted by 1 250% and the remaining part between Kygp and the detachment 
point D is risk-weighted by 12.5 - Ksgra (Kipp). This is equivalent to consider that the 
sub-tranche K,pp — A has already defaulted, while the credit risk is on the sub-tranche 
D — Kipp. In the second case Kipp < A < D, the first term of the formula vanishes, and 
we retrieve the RWA formula (3.27) on page 177. 


72The structure of pay-through securitization is shown in Figure 3.12 on page 139. 

731t corresponds to the part of the collateral pool, for which the bank has the information on the individual 
underlying exposures. 

74T¢, corresponds to the variable 1C* in the IRB formula on page 177. 
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The capital charge for one unit of securitization exposure is equal to”: 


exp (cu) — exp (cl) 
c(u— l) 


Kssra (Kipp) = 
where c = — (pKipp)’, u= D— Kips, l= (A = Kirs) t and: 


p = max (o3: MSTC (o + £ +7: Kırg +ô- LGD +e- Muni} 


The parameter p is called the supervisory parameter and is a function of the effective num- 
ber’® of loans N, the average LGD and the effective maturity’ My4,p) of the tranche. The 
coefficient mstc is equal to 1 for non-STC securitizations and 0.5 for STC securitizations, 
while the other parameters a, 3, y, 6 and € are given in Table 3.31. We notice that the 
values depend on the underlying portfolio (wholesale or retail), the granularity (N < 25 or 
N > 25) and the seniority. 


TABLE 3.31: Value of the parameters a, 8, y, 6 and e€ (SEC-IRBA) 


Category Senior Granularity a B y ô € 
V N > 25 0.00 3.56 —1.85 0.55 0.07 
N < 25 0.11 2.61 —2.91 0.68 0.07 
Wholesale 


N > 25 0.16 2.87 —1.03 0.21 0.07 

N < 25 0.22 2.35 —2.46 0.48 0.07 

Vv 0.00 0.00 —7.48 0.71 0.24 
0.00 0.00 —5.78 0.55 0.27 


Retail 


Remark 33 The derivation of these formulas is based on the model of Gordy and Jones 
(2003). 


Example 31 We consider a non-STC CDO based on wholesale assets with three tranches: 
equity (0% —5%), mezzanine (5% —30%) and senior (30% — 100%). The remaining maturity 
is equal to 10 years. The analysis of the underlying portfolio shows that the effective number 
of loans N is equal to 30 and the average LGD is equal to 30%. We also assume that 
Kirg = 18%, Ksa = 20% and w = 95%. 


We have Kirg = 0.95 x 18% + 0.05 x 20% = 18.1%. Since Kırg > Dequity, we deduce 
that RWequity = 1 250%. For the mezzanine tranche, we have 1+0.8 x (M — 1) = 8.2 years, 
meaning that the 5-year cap is applied. Using Table 3.31 (fourth row), we deduce that 
a = 0.16, 8 = 2.87, y = —1.03, 6 = 0.21 and e = 0.07. It follows that: 


2. 
p = max (030: 0.16 + a — 1.03 x 18.1% + 0.21 x 30% + 0.07 x 5) 


= 48.22% 


T55SFA means simplified supervisory formula approach. 
76 The effective number is equal to the inverse of the Herfindahl index H where H = Yai w2 and w; is 


the weight of the it” asset. In our case, we have w; = EAD; / ee EAD,, implying that: 


2 
Di EAD;) 
N= i=l ; 
ja, BAD; 
TTLike for the IRB approach, M,a;p] is the effective maturity with a one-year floor and five-year cap. The 
effective maturity can be calculated as the weighted-average maturity of the cash-flows of the tranche or 
1+0.8- (M — 1) where M is the legal maturity of the tranche. 
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Since we have c = —11.46, u = 11.90% and | = 0%, we obtain Kggra (Kipp) = 54.59%. 
Finally, Equation (3.36) gives RW mezzanine = 979.79%. If we perform the same analysis for 
the senior tranche”®, we obtain RW senior = 10.84%. 


3.2.5.3 External ratings-based approach (SEC-ERBA) 


Under the ERBA, we have: 
RWA = EAD - RW 


where EAD is the securitization exposure amount and RW is the risk weight that depends 
on the external rating’? and four other parameters: the STC criterion, the seniority of the 
tranche, the maturity and the thickness of the tranche. In the case of short-term ratings, 
the risk weights are given below: 


Rating , A-1/P-1 A-2/P-2 A-3/P-3 Other 


STC ı 10% 30% 60% 1250% 
non-STC ! 15% 50% 100% 1250% 


For long term ratings, the risk weight goes from 15% for AAA-grade to 1250% (Table 2, 
BCBS 2016e, page 27). An example of risk weights for non-STC securitizations is given 
below: 


Ratna Senior ;  Non-senior 

| 1Y 5Y ! 1Y 5Y 
AAA 15% 20% 15% 70% 
AA 25% 40% ! 30% 120% 
A 50% 65% ı 80% 180% 
BBB 90% 105% 220% 310% 
BB 160% 180% , 620% 760% 
B 310% 340% ! 1050% 1050% 
CCC , 460% 505% , 1250% 1250% 


Below CCC- ! 1250% 1250% ! 1250% 1250% 


These risk weights are then adjusted for taking into account the effective maturity M[a,p) 
and the thickness D — A of the tranche. The maturity adjustment corresponds to a linear 
interpolation between one and five years. The thickness adjustment must be done for non- 
senior tranches by multiplying the risk weight by the factor 1 — min (D — A; 0.5). 


Example 32 We consider Example 31 and we assume that the mezzanine and senior 
tranches are rated BB and AAA. 


Using the table above, we deduce that the non-adjusted risk weights are equal to 
1250% for the equity tranche, 760% for the mezzanine tranche and 20% for the se- 
nior tranche. There is no maturity adjustment because Mj,4,p) is equal to five years. Fi- 
nally, we obtain RWeguity = 1250% x (1 — min (5%, 50%)) = 1187.5%, RW mezzanine = 
760% x (1 — min (25%, 50%)) = 570% and RWsenior = 20%. 


3.2.5.4 Standardized approach (SEC-SA) 


The SA is very close to the IRBA since it uses Equation (3.36) by replacing Kypp 
by Ka and the supervisory parameter p by the default values 0.5 and 1 for STC and non- 
STC securitizations. To calculate K4, we first determine Ks, which is the ratio between the 


78In this case, the parameters are a = 0, 8 = 3.56, y = —1.85, 6 = 0.55 and e = 0.07 (second row 
in Table 3.31). We have p = max (30%; 29.88%) = 30%, c = —18.42, u = 81.90%, l = 11.90%, and 
Kssra (Kirp) = 0.87%. 

T9By definition, this approach is only available for tranches that are rated. 
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weighted average capital charge of the underlying portfolio computed with the SA approach 
and the exposure amount of the underlying portfolio. Then, we have: 


Ka = (1 -— w): Ksa +w: 50% 
where w is the percentage of underlying exposures that are 90 days or more past due. 


Remark 34 The SEC-SA is the only approach allowed for calculating the capital require- 
ment of resecuritization exposures. In this case, w is set to zero and the supervisory param- 
eter p is equal to 1.5. 


If we consider Example 31 on page 187 and assume that w = 0, we obtain RW equity = 
1250%, RW mezzanine = 1 143% and RW senior = 210.08%. 


IRBA, Wholesale, D-A=57 IRBA, retail, D-A=57 
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FIGURE 3.23: Risk weight of securitization exposures 


Example 33 We consider a CDO tranche, whose attachment and detachment points are 
A and D. We assume that Kirg = Ka = 20%, N = 30, LGD = 50% and w = 0. 


In Figure 3.23, we have represented the evolution of the risk weight RW of the tranche 
[A, D] for different values of A and D. For the first third panels, the thickness of the tranche 
is equal to 5%, while the detachment point is set to 100% for the fourth panel. In each panel, 
we consider two cases: non-STC and STC. If we compare the first and second panels, we 
notice the impact of the asset category (wholesale vs retail) on the risk weight. The third 
panel shows that the SA approach penalizes more non-STC securitization exposures. Since 
the detachment point is equal to 100%, the fourth panel corresponds to a senior tranche for 
high values of the attachment point A and a non-senior tranche when the attachment point 
A is low. In this example, we assume that the tranche becomes non-senior when A < 30%. 
We observe a small cliff effect for non-STC securitization exposures. 
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3.3 Credit risk modeling 


We now address the problem of parameter specification. This mainly concerns the ex- 
posure at default, the loss given default and the probability of default because the effective 
maturity is well defined. This section also analyzes default correlations and non granular 
portfolios when the bank develops its own credit model for calculating economic capital and 
satisfying Pillar 2 requirements. 


3.3.1 Exposure at default 


According to BCBS (2017c), the exposure at default “for an on-balance sheet or off- 
balance sheet item is defined as the expected gross exposure of the facility upon default of 
the obligor”. Generally, the computation of EAD for on-balance sheet assets is not an issue. 
For example, EAD corresponds to the gross notional in the case of a loan or a credit. In 
fact, the big issue concerns off-balance sheet items, such as revolving lines of credit, credit 
cards or home equity lines of credit (HELOC). At the default time 7, we have (Taplin et 
al., 2007): 

EAD (7 | t) = B(t) + CCF. (L(t) — B()) (3.37) 
where B (t) is the outstanding balance (or current drawn) at time t, L (t) is the current un- 
drawn limit of the credit facility? and CCF is the credit conversion factor. This means that 


the exposure at default for off-balance sheet items has two components: the current drawn, 
which is a non-random component and the future drawn, which is a random component. 


From Equation (3.37), we deduce that: 


EAD (T | t) — B(t) 
LA- B® 


CCF = (3.38) 


At first sight, it looks easy to estimate the credit conversion factor. Let us consider the 
off-balance sheet item ¿ that has defaulted. We have: 

B; (ti) — Bi (t) 

Li (t) — Bi (t) 


At time 7;, we observe the default of Asset i and the corresponding exposure at default, 
which is equal to the outstanding balance B; (7;). Then, we have to choose a date t < 7; 
to observe B; (t) and L; (t) in order to calculate the CCF. We notice that it is sensitive to 
the time period 7; — t, but banks generally use a one-year time period. Therefore, we can 
calculate the mean or the quantile a of a sample {CCF,,...,CCF,,} for a given homogenous 
category of off-balance sheet items. Like the supervisory CCF values, the estimated CCF is 
a figure between 0% and 100%. 


In practice, it is difficult to estimate CCF values for five reasons: 
1. As explained by Qi (2009), there is a ‘race to default’ between borrowers and lenders. 
Indeed, “as borrowers approach default, their financial conditions deteriorate and they 


may use the current undrawn as a source of funding, whereas lenders may cut back 
credit lines to reduce potential losses” (Qi, 2009, page 4). 


2. L;(t) depends on the current time t, meaning that it could evolve over time. 


80The current undrawn ZL (t) — B (t) is the amount that the debtor is able to draw upon in addition to 
the current drawn B (t). 
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3. The computation of the CCF is sensitive to the denominator L; (t) — B; (t), which can 
be small. When L; (t) ~ B; (t), the CCF ratio is unstable. 


4. We have made the assumption that CCF; (7; — t) € [0,1], implying that B;(7;) > 
B; (t) and B; (7;) < L; (t). This is not always true. We can imagine that the outstand- 
ing balance decreases between the current time and the default time (CCF; (T; — t) < 
0) or the outstanding balance at the default time is greater than the limit L; (t). Ja- 
cobs, Jr. (2010) reports extreme variation larger than +3000% when computing raw 
CCF values! 


5. The credit conversion factor is generally an increasing function of the default proba- 
bility of the borrower. 


Because of the previous issues, the observed CCF is floored at 0% and capped at 100%. 
Tong et al. (2016) report the distribution of the credit conversion factor of credit cards from 
a UK bank*!, and notice that the observations are mainly concentred on the two extreme 
points 0% and 100% after truncation. Another measure for modeling the exposure at default 
is to consider the facility utilization change factor (Yang and Tkachenko, 2012): 


It corresponds to the credit conversion factor, where the current undrawn amount L; (t) — 
B; (t) is replaced by the current authorized limit L; (t). It has the advantage to be more 
stable, in particular around the singularity L; (t) = B; (t). 

The econometrics of CCF is fairly basic. As said previously, it consists in estimating the 
mean or the quantile a of a sample {CCF,,...,CCF,,}. For that, we can use the cohort 
method or the time horizon approach (Witzany, 2011). In the cohort method, we divide 
the study period into fixed intervals (6 or 12 months). For each asset, we identify if it has 
defaulted during the interval, and then we set t to the starting date of the interval. In the 
time horizon approach, t is equal to the default time 7; minus a fixed horizon (e.g. one, 
three or 12 months). Sometimes, it can be useful to include some explanatory variables. In 
this case, the standard model is the Tobit linear regression, which is presented on page 708, 
because data are censored and the predicted value of CCF must lie in the interval [0, 1]. 


3.3.2 Loss given default 
3.3.2.1 Definition 


The recovery rate R is the percentage of the notional on the defaulted debt that can be 
recovered. In the Basel framework, the recovery rate is not explicitly used, and the concept 
of loss given default is preferred for measuring the credit portfolio loss. The two metrics are 
expressed as a percentage of the face value, and we have: 


LGD >1-R 


Let us consider a bank that is lending $100 mn to a corporate firm. We assume that the 
firm defaults at one time and the bank recovers $60 mn. We deduce that the recovery rate 
is equal to: 

60 


81See Figure 1 on page 912 in Tong et al. (2016). 
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In order to recover $60 mn, the bank has incurred some operational and litigation costs, 
whose amount is $5 mn. In this case, the bank has lost $40 mn plus $5 mn, implying that 
the loss given default is equal to: 

40+5 


=> —— = 4 
LGD 100 5% 


In fact, this example shows that R and LGD are related in the following way: 
LGD=1-R-+c 


where c is the litigation cost. We now understand why the loss given default is the right 
measure when computing the portfolio loss. 


Schuermann (2004) identifies three approaches for calculating the loss given default: 
1. Market LGD 

2. Implied LGD 

3. Workout LGD 


The market LGD is deduced from the bond price just after the default®?. It is easy to 
calculate and available for large corporates and banks. The implied LGD is calculated from 
a theoretical pricing model of bonds or CDS. The underlying idea is to estimate the implied 
loss given default, which is priced by the market. As for the first method, this metric 
is easy to calculate, but it depends on the model assumptions. The last approach is the 
workout or ultimate LGD. Indeed, the loss given default has three components: the direct 
loss of principal, the loss of carrying non-performing loans and the workout operational and 
legal costs. The workout LGD is the right measure when considering the IRB approach. 
Nevertheless, Schuermann (2004) notices that between two and three years are needed on 
average to obtain the recovery. 


In what follows, we present two approaches for modeling LGD. The first approach con- 
siders that LGD is a random variable, whose probability distribution has to be estimated: 


LGD ~F (z) (3.39) 


However, we recall that the loss given default in the Basel IRB formulas does not correspond 
to the random variable, but to its expectation E [LGD]. Therefore, the second approach 
consists in estimating the conditional expectation: 


[LGD] = E[LGD | X1=21,...,Xm= £m] 
= g(@1,---,2m) (3.40) 
where (X1,...,Xm) are the risk factors that determine the loss given default. 


Remark 35 We notice that R € [0,1], but LGD > 0. Indeed, we can imagine that the 
litigation cost can be high compared to the recovery part of the debt. In this case, we can 
have c > R, implying that LGD > 100%. For instance, if R = 20% and c = 30%, we obtain 
LGD = 110%. This situation is not fanciful, because R and c are not known at the default 
time. The bank will then begin to engage costs without knowing the recovery amount. For 
example, one typical situation is R = 0% and c > 0, when the bank discovers that there is 
no possible recovery, but has already incurs some litigation costs. Even if LGD can be larger 
than 100%, we assume that LGD € [0,1] because these situations are unusual. 


82This measure is also called ‘trading price recovery’. 
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3.3.2.2 Stochastic modeling 


Using a parametric distribution In this case, we generally use the beta distribution 
B (a, B), which is described on page 1053. Its density function is given by: 


get (1 — a)? * 
B (a, p) 


where % (a, 8) = J. t®=1 (1 — +)?" dt. The mean and the variance are: 


f(x)= 


and: 


o? (X) = var (X) = ——, 
(0+ 8) (a+8+1) 

When a and 6 are greater than 1, the distribution has one mode £mode = 

(a—1)/(a+ 6-2). This probability distribution is very flexible and allows to obtain 

various shapes that are given in Figure 3.24: 


e if a = 1 and 6 = 1, we obtain the uniform distribution; if a 4 co and 8 > œo, we 
obtain the Dirac distribution at the point « = 0.5; if one parameter goes to zero, we 
obtain a Bernoulli distribution; 


e if a = B, the distribution is symmetric around x = 0.5; we have a bell curve when 
the two parameters a and 8 are higher than 1, and a U-shape curve when the two 
parameters a and £ are lower than 1; 


e if a > ß, the skewness is negative and the distribution is left-skewed, if a < 6, the 
skewness is positive and the distribution is right-skewed. 


Given the estimated mean fi,gp and standard deviation GLcp of a sample of losses given 
default, we can calibrate the parameters a and 8 using the method of moments*®: 


AQ 1—9 
ĝian (| — Aten) lice (3.41) 


and: ; 
fitap (1 — finan) 


r9] 
°LGD 


p= (1 — fitap) (3.42) 
The other approach is to use the method of maximum likelihood, which is described in 
Section 10.1.2 on page 614. 


Example 34 We consider the following sample of losses given default: 68%, 90%, 22%, 
45%, 17%, 25%, 89%, 65%, 75%, 56%, 87%, 92% and 46%. 


We obtain fitap = 59.77% and ôLGD = 27.02%. Using the method of moments, the 
estimated parameters are Gym = 1.37 and Gym = 0.92, whereas we have âM = 1.84 
and Bu = 1.25 for the method of maximum likelihood. We notice that the two calibrated 
probability distributions have different shapes (see Figure 3.25). 


83See Section 10.1.3.1 on page 628. 
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FIGURE 3.24: Probability density function of the beta distribution B (a, 6) 
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FIGURE 3.25: Calibration of the beta distribution 
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ot(u) 


FIGURE 3.26: Maximum standard deviation oF (u) 


Remark 36 We can calibrate the beta distribution as long as we respect some constraints 
on fitap and Giap. Using Equations (3.41) and (8.42), we deduce that: 


6tap < V fitap (1 — ûre) 


because & and B must be positive. This condition is not well restrictive. Indeed, if we consider 
a general random variable X on [0,1], we have [X?] <E[X], implying that: 


a (X) < oF (u) = Vu (1— u) 


where p = E [|X]. Therefore, only the limit case cannot be reached by the beta distribution®*. 
However, we notice that the standard deviation cannot be arbitrary fixed to a high level. For 
example, Figure 3.26 shows that there is no random variable on [0,1] such that u = 10% 
and a > 30%, u = 20% and o > 40%, u = 50% and o > 50%, etc. 


In Figure 3.27, we have reported the calibrated beta distribution using the method of 
moments for several values of urap and orep = 30%. We obtain U-shaped probability dis- 
tributions. In order to obtain a concave (or bell-shaped) distribution, the standard deviation 
OLGp must be lower (see Figure 3.28). 


Remark 37 The previous figures may leave us believing that the standard deviation must 
be very low in order to obtain a concave beta probability density function. In fact, this is not 
a restriction due to the beta distribution, since it is due to the support [0,1] of the random 
variable. Indeed, we can show that the standard deviation is bounded” by \/1/12 ~ 28.86% 
when the probability distribution has one mode on [0,1]. 


84The limit case corresponds to the Bernoulli distribution B (p) where p = p. 
85The bound is the standard deviation of the uniform distribution Uo,1)- 
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FIGURE 3.28: Calibration of the beta distribution when o,ap = 10% 


Credit Risk 197 


As noted by Altman and Kalotay (2014), the beta distribution is not always appropriate 
for modeling loss given default even if it is widespread used by the industry. Indeed, we ob- 
serve that losses given default tend to be bimodal, meaning that the recovery rate is quite 
high or quite low (Loterman et al., 2012). This is why Altman and Kalotay (2014) propose 
to model the loss given default as a Gaussian mixture model. They first apply the transfor- 
mation y; = ®~!(LGD,) to the sample, then calibrate? the 4~-component mixture model on 
the transformed data (y1,..-,Yn) and finally perform the inverse transform for estimating 
the parametric distribution. They show that the estimated distribution fits relatively well 
the non-parametric distribution estimated with the kernel method. 


Using a non-parametric distribution The beta distribution is either bell-shaped or 
U-shaped. In this last case, the limit is the Bernoulli distribution: 


LGD | 0% 100% 
Probability | (1 — uran) #LaD 


This model is not necessarily absurd, since it means that the recovery can be very high or 
very low. Figure 2 in Bellotti and Crook (2012) represents the histogram of recovery rates of 
55 000 defaulted credit card accounts from 1999 to 2005 in the UK. The two extreme cases 
(R = 0% and R = 100%) are the most frequent cases. Therefore, it is interesting to consider 
the empirical distribution instead of an estimated distribution. In this case, we generally 
consider risk classes, e.g. 0% — 5%, 5% — 10%, 10% — 20%,..., 80% — 90%, 90% — 100%. 


Example 35 We consider the following empirical distribution of LGD: 


LGD (in %) | 0 10 20 25 30 40 50 60 70 75 80 90 100 
p(in%) |1 2 10 25 10 2 0 2 10 25 10 2 1 


This example illustrates the shortcoming of the beta modeling when we have a bimodal 
LGD distribution. In Figure 3.29, we have reported the empirical distribution, and the 
corresponding (rescaled) calibrated beta distribution. We notice that it is very far from the 
empirical distribution. 


Remark 38 Instead of using the empirical distribution by risk classes, we can also consider 
the kernel approach, which is described on page 637. 


Example 36 We consider a credit portfolio of 10 loans, whose loss is equal to: 


10 
L= X EaD; LGD; -1 {r; < Ti} 


isl 


where the maturity T; is equal to 5 years, the exposure at default EaD; is equal to $1000 
and the default time T; is exponential with the following intensity parameter Ai: 


i la 2 3 4 5 6 7 8 9 10 
à; (in bps) | 10 10 25 25 50 100 250 500 500 1000 


The loss given default LGD; is given by the empirical distribution, which is described in 
Example 85. 


86The estimation of Gaussian mixture models is presented on page 624. 
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FIGURE 3.29: Calibration of a bimodal LGD distribution 


In Figure 3.30, we have calculated the distribution of the portfolio loss with the Monte 
Carlo method. We compare the loss distribution when we consider the empirical distribution 
and the calibrated beta distribution for the loss given default. We also report the loss 
distribution when we replace the random variable LGD; by its expected value E [LGD;] = 
50%. We observe that the shape of L highly depends on the LGD model. For example, we 
observe a more pronounced fat tail with the calibrated beta distribution. This implies that 
the LGD model has a big impact for calculating the value-at-risk. For instance, we have 
reported the loss distribution using the beta model for different values of (uLGp, LGD) 
in Figure 3.31. We conclude that the modeling of LGD must not be overlooked. In many 
cases, the model errors have more impact when they concern the loss given default than the 
probability of default. 


Remark 39 The expression of the portfolio loss is: 
L= ŞT BAD; LGD; Liri < T;} 
i=1 
If the portfolio is fined grained, we have: 
E[L|X]= 5 EAD, -E [LGD,] - pi (X) 
i=1 


We deduce that the distribution of the portfolio loss is equal to: 


peti th= f fi [Sean enan nt <e} dH (z) 


This loss distribution does not depend on the random variables LGD;, but on their expected 
values E[LGD,]. This implies that it is not necessary to model the loss given default, but 
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only the mean. Therefore, we can replace the previous expression of the portfolio loss by: 


L= X_ EAD; -E[LGD]-1 {17 < T;} 
i=l 


3.3.2.3 Economic modeling 


There are many factors that influence the recovery process. In the case of corporate 
debt, we distinguish between specific and economic factors. For instance, specific factors 
are the relative seniority of the debt or the guarantees. Senior debt must be repaid before 
subordinated or junior debt is repaid. If the debt is collateralized, this affects the loss given 
default. Economic factors are essentially the business cycle and the industry. In the third 
version of Moody’s LossCalc, Dwyer and Korablev (2009) consider seven factors that are 
grouped in three major categories: 


1. factors external to the issuer: geography, industry, credit cyle stage; 


2. factors specific to the issuer: distance-to-default, probability of default (or leverage 
for private firms); 


3. factors specific to the debt issuance: debt type, relative standing in capital structure, 
collateral. 


Curiously, Dwyer and Korablev (2009) explain that “some regions have been characterized 
as creditor-friendly, while others are considered more creditor-unfriendly”. For instance, 
recovery rates are lower in the UK and Europe than in the rest of the world. However, the 
most important factors are the seniority followed by the industry, as it is illustrated by the 
Moody’s statistics on ultimate recoveries. From 1987 to 2017, the average corporate debt 
recovery rate is equal to 80.4% for loans, 62.3% for senior secured bonds, 47.9% for senior 
unsecured bonds and 28.0% for subordinated bonds (Moody’s, 2018). It is interesting to 
notice that the recovery rate and the probability of default are negatively correlated. Indeed, 
Dwyer and Korablev (2009) take the example of two corporate firms A and B, and they 
assume that PDg > PD4. In this case, we may think that the assets of A relative to its 
liabilities is larger than the ratio of B. Therefore, we must observe a positive relationship 
between the loss given default and the probability of default. 


Remark 40 The factors depend of the asset class. For instance, we will consider more 
microeconomic variables when modeling the loss given default for mortgage loans (Tong et 
al., 2013). 


Once the factors are identified, we must estimate the LGD model: 
LGD = f (X1,...,Xm) 


where X1,..., Xm are the m factors, and f is a non-linear function. Generally, we consider 
a transformation of LGD in order to obtain a more tractable variable. We can apply a logit 
transform Y = ln (LGD) — ln (1 — LGD), a probit transform Y = 6~!(LGD) or a beta 
transformation (Bellotti and Crook, 2012). In this case, we can use the different statistical 
tools given in Chapters 10 and 15 to model the random variable Y. The most popular models 
are the logistic regression, regression trees and neural networks (Bastos, 2010). However, 
according to EBA (2017), multivariate regression remains the most widely used methods, 
despite the strong development of machine learning techniques, that are presented on page 
943. 
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Remark 41 We do not develop here the econometric approach, because it is extensively 
presented in Chapter 15 dedicated to the credit scoring. Indeed, statistical models of LGD 
use the same methods than statistical models of PD. We also refer to Chapter 14 dedicated 
to stress testing methods when we would like to calculate stressed LGD parameters. 


3.3.3 Probability of default 
3.3.3.1 Survival function 
The survival function is the main tool to characterize the probability of default. It is 


also known as reduced-form modeling. 


Definition and main properties Let 7 be a default (or survival) time. The survival 
function” is defined as follows: 


I 


S(t) Pr{rT >t} 


1—F(t) 


where F is the cumulative distribution function. We deduce that the probability density 
function is related to the survival function in the following manner: 


f(y =-98 


In survival analysis, the key concept is the hazard function A (t), which is the instantaneous 
default rate given that the default has not occurred before t: 


(3.43) 


Pr{t<7<t+dt|7r>t} 


th= 1 
ae) dio+ dt 
We deduce that: 
> « PRPS re tds} 1 
M S, dt “Pr{r >t} 
_ Ft) 
S (t) 
Using Equation (3.43), another expression of the hazard function is: 
dO, S (t) 
Aat) = - 
(t) S(t) 
_ _9inS(t) 
E Ot 


The survival function can then be rewritten with respect to the hazard function and we 
have: 


S(t) = e7 fe (3.44) 


In Table 3.32, we have reported the most common hazard and survival functions. They can 
be extended by adding explanatory variables in order to obtain proportional hazard models 
(Cox, 1972). In this case, the expression of the hazard function is A(t) = Xo (t) exp (8' 2) 
where Ag (t) is the baseline hazard rate and x is the vector of explanatory variables, which 
are not dependent on time. 


87Previously, we have noted the survival function as St, (t). Here, we assume that the current time to is 
0. 
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TABLE 3.32: Common survival functions 


Model S (t) A(t) 
Exponential exp(—At) À 
Weibull exp (—At7) Ayt! 


Log-normal 1—@(yln(At)) ytte (yln (At)) / (1 — ® (yln (At))) 
Log-logistic 1/ (1 + a) Agtt) (t + atta) 
Gompertz exp(A(1—e%)) Ayexp (yt) 


The exponential model holds a special place in default time models. It can be justified 
by the following problem in physics: 


“ Assume that a system consists of n identical components which are connected 
in series. This means that the system fails as soon as one of the components fails. 
One can assume that the components function independently. Assume further 
that the random time interval until the failure of the system is one n*® of the 
time interval of component failure” (Galambos, 1982). 


We have Pr {min (71,...,7) < t} = Pr{7 < n- t}. The problem is then equivalent to solve 
the functional equation S(t) = S” (t/n) with S(t) = Pr{7, > t}. We can show that the 
unique solution for n > 1 is the exponential distribution. Following Galambos and Kotz 
(1978), its other main properties are: 


1. the mean residual life E |r | T > t] is constant; 


2. it satisfies the famous lack of memory property: 
Pr{r>t+u|7r>t}=Pr{r>u} 
or equivalently S (t + u) = S (t) S (u); 
3. the probability distribution of n - Ti:n is the same as probability distribution of 7;. 
Piecewise exponential model In credit risk models, the standard probability distri- 


bution to define default times is a generalization of the exponential model by considering 
piecewise constant hazard rates: 


M 
AW = X Antes tee} 


m=1 


= àm iftel]tp-n th] 


where ¢*, are the knots of the function®®. For t € |t*,_1,¢*,], the expression of the survival 
function becomes: 
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88We have t = 0 and t5, 
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It follows that the density function is equal to®?: 


m-—1 
f (t) = Am exp (- 5 Ar (th — th_1) — Am (t m) 
k=1 


In Figure 3.32, we have reported the hazard, survival and density functions for three set 
of parameters {(t*,,Am),m=1,...,M}: 


{(1, 1%) , (2, 1.5%) , (3, 2%) , (4, 2.5%) , (00,3%)} for »1 (t) 
{(1, 10%) , (2, 7%) , (5, 5%) , (7,4.5%) , (00,6%)} for Az (t) 


and A3 (t) = 4%. We note the special shape of the density function, which is not smooth at 
the knots. 


Hazard function S(t) 


— A(t) 
Sop- = nnn menananananananananannn =-=: Ap(t) 
menean == A3(t) 


t (in years) 


Survival function S(t) Density function f(t) 


me Te, 
Oe, 
= ts, 


Q2 


t (in years) t (in years) 


FIGURE 3.32: Example of the piecewise exponential model 


Estimation To estimate the parameters of the survival function, we can use the cohort 
approach. Under this method, we estimate the empirical survival function by counting the 
number of entities for a given population that do not default over the period At: 


Wye L{t <7 <t+ At} 
n 


§ (At) =1 


where n is the number of entities that compose the population. We can then fit the survival 
function by using for instance the least squares method. 


89We verify that: 
t 
PO Lam ifte i-i tha] 


(t) 
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Example 37 We consider a population of 1000 corporate firms. The number of defaults 
np (At) over the period At is given in the table below: 


At (in months) 3 6 9 12 15 18 21 22 
np (At) 25 9 12 16 20 25 29 


We obtain § (0.25) = 0.998, $(0.50) = 0.995, (0.75) = 0.991, (1.00) = 0.988, 
S (1.25) = 0.984, Ŝ (1.50) = 0.980, Ŝ (1.75) = 0.975 and S$ (2.00) = 0.971. For the exponen- 
tial model, the least squares estimator d is equal to 1.375%. In the case of the Gompertz 
survival function, we obtain Â = 2.718% and ĝ = 0.370. If we consider the piecewise expo- 
nential model, whose knots correspond to the different periods At, we have A, = 0.796%, 
do = 1.206%, Az = 1.611%, Aq = 1.216%, Âs = 1.617%, Ae = 1.640%, Ay = 2.044% and 
a = 1.642%. To compare these three calibrations, we report the corresponding hazard 
functions in Figure 3.33. We deduce that the one-year default probability’? is respectively 
equal to 1.366%, 1.211% and 1.200%. 
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FIGURE 3.33: Estimated hazard function 


In the piecewise exponential model, we can specify an arbitrary number of knots. In the 
previous example, we use the same number of knots than the number of observations to 
calibrate. In such case, we can calibrate the parameters using the following iterative process: 


1. We first estimate the parameter , for the earliest maturity Atı. 


2. Assuming that (às, aches \i-) have been estimated, we calculate ‘i for the next ma- 
turity Ati. 
3. We iterate step 2 until the last maturity Atm. 


90We have PD = 1 — S (1). 
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This algorithm works well if the knots t*, exactly match the maturities. It is known as 
the bootstrap method and is very popular to estimate the survival function from market 
prices. Let {s (T1),...,5(Taz)} be a set of CDS spreads for a given name. Assuming that 
Tı < To < ... < Tm, we consider the piecewise exponential model with t*, = Tm. We 
first estimate \, such that the theoretical spread is equal to s (Tı). We then calibrate the 
hazard function in order to retrieve the spread s$ (Tz) of the second maturity. This means 
to consider that À (t) is known and equal to \, until time T, whereas À (t) is unknown from 
Tı to To: 7 
_ à ifte ]0, Tı] 
A= { Ag if t €|T, T] 


Estimating de is therefore straightforward because it is equivalent to solve one equation 
with one variable. We proceed in a similar way for the other maturities. 


Example 38 We assume that the term structure of interest rates is generated by the Nelson- 
Siegel model with 6, = 5%, 02 = —5%, 03 = 6% and 04 = 10. We consider three credit 
curves, whose CDS spreads expressed in bps are given in the following table: 


Maturity 


(in years) ah ae are 
1 50 50 350 

3 60 60 370 

5 70 90 390 

7 80 115 385 

10 90 125 370 


The recovery rate R is set to 40%. 


TABLE 3.33: Calibrated piecewise exponential model from CDS prices 


Maturity 
(in years) is wa #3 

1 83.3 83.3 582.9 

3 110.1 110.1 637.5 

5 140.3 235.0 702.0 

7 182.1 289.6 589.4 

10 194.1 241.9 498.5 


Using the bootstrap method, we obtain results in Table 3.33. We notice that the piecewise 
exponential model coincide for the credit curves #1 and #2 for t < 3 years. This is normal 
because the CDS spreads of the two credit curves are equal when the maturity is less or 
equal than 3 years. The third credit curve illustrates that the bootstrap method is highly 
sensitive to small differences. Indeed, the calibrated intensity parameter varies from 499 to 
702 bps while the CDS spreads varies from 350 to 390 bps. Finally, the survival function 
associated to these 3 bootstrap calibrations are shown in Figure 3.34. 


Remark 42 Other methods for estimating the probability of default are presented in Chap- 
ter 19 dedicated to credit scoring models. 
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S(t) 


t (in years) 


FIGURE 3.34: Calibrated survival function from CDS prices 


3.3.3.2 Transition probability matrix 


When dealing with risk classes, it is convenient to model a transition probability matrix. 
For instance, this approach is used for modeling credit rating migration. 


Discrete-time modeling We consider a time-homogeneous Markov chain , whose tran- 
sition matrix is P = (p;,;). We note S = {1,2,..., K} the state space of the chain and p,;,; 
is the probability that the entity migrates from rating i to rating j. The matrix P satisfies 
the following properties: 


e Vi E S, Dj Piy = 1. 


In credit risk, we generally assume that K is the absorbing state (or the default state), 
implying that any entity which has reached this state remains in this state. In this case, we 
have pg, = 1. Let R(t) be the value of the state at time t. We define p(s, i; t, j) as the 
probability that the entity reaches the state j at time t given that it has reached the state 
i at time s. We have: 


p(s, it, j) = Pr{R(t)=j|R(s)=i} 
— (t—s) 
= Pij 


This probability only depends on the duration between s and t because of the Markov prop- 
erty. Therefore, we can restrict the analysis by calculating the n-step transition probability: 


ph?) =Pr{R(t+n) =j RE = 3 
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and the associated n-step transition matrix P(™ = (ofp). For n = 2, we obtain: 
2 ; l 
PË =Pr{R(t+2) =j | RE) =i} 


K 
=) _Pr{R(t+2) =j, R(t+1)=k| R(t) =i} 
k=1 


K 
=X 0 Pr{R(t+2) =F | RE+1) =k} Pr{R(t+1)=k| R(t) =i} 


K 
= X pik ` Pk,j 


In a similar way, we obtain: 
K 
pe =S N vmm>0 (3.45) 
k=1 


This equation is called the Chapman-Kolmogorov equation. In matrix form, we have: 


plrtm) — pln), pm) 
with the convention P) = J. In particular, we have: 


pP™® = pil), pQ 
= pr-2), p®. pa 


= Il PO 
t=1 
= P” 


We deduce that: 

p(t,it+n, j) = ph? =e] P'e; (3.46) 
When we apply this framework to credit risk, R (t) denotes the rating (or the risk class) 
of the firm at time t, p;,; is the one-period transition probability from rating i to rating j 
and p; x is the one-period default probability of rating i. In Table 3.34, we report the S&P 
one-year transition probability matrix for corporate bonds estimated by Kavvathas (2001). 
We read the figures as follows®!: a firm rated AAA has a one-year probability of 92.82% to 
remain AAA; its probability to become AA is 6.50%; a firm rated CCC defaults one year 
later with a probability equal to 23.50%; etc. In Tables 3.35 and 3.36, we have reported the 
two-year and five-year transition probability matrices. We detail below the calculation of 


(2) . 
PAAA, AAA: 


(2) 
PAAA, aaa ~ PAAA,AAA X PAAA,AAA T PAAA,AA X PAA,AAA T PAAA,A X PA,AAA T 


PAAA,BBB X PBBB,AAA T PAAA,BB X PBB,AAA T PAAA,B X PB,AAA T 
PAAA,CCC X PCCC,AAA 

= 0.92837 + 0.0650 x 0.0063 + 0.0056 x 0.0008 + 
0.0006 x 0.0005 + 0.0006 x 0.0004 

= 86.1970% 
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TABLE 3.34: Example of credit migration matrix (in %) 


AAA AA A BBB BB B CCC D 
AAA 92.82 6.50 0.56 0.06 0.06 0.00 0.00 0.00 
AA 0.63 91.87 664 0.65 0.06 0.11 0.04 0.00 
A 0.08 2.26 91.66 5.11 O61 0.23 0.01 0.04 
BBB 0.05 0.27 5.84 87.74 4.74 0.98 0.16 0.22 
BB 0.04 O11 0.64 7.85 81.14 8.27 0.89 1.06 
B 0.00 0.11 0.30 0.42 6.75 83.07 3.86 5.49 
CCC 0.19 0.00 038 0.75 2.44 12.03 60.71 23.50 
D 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 


Source: Kavvathas (2001). 


(n) 


We note 7; the probability of the state i at time n: 


a” —Pr{R(n) =i} 
and r™ = Ga oe ae) the probability distribution. By construction, we have: 
nt!) — pT z(r) 
The Markov chain % admits a stationary distribution 7* if??: 
n* = P'r* 
In this case, 77 is the limiting probability of state 7: 
(n) + 


lim = 7: 
nog ki i 


We can interpret 7* as the average duration spent by the chain ® in the state i. Let 7; be 
the return period”? of state i: 


J, = inf {n: R(n) =i | R(0) = 7} 


The average return period is then equal to: 


For credit migration matrices, there is no stationary distribution because the long-term 
rating 8 (co) is the absorbing state as noted by Jafry and Schuermann: 


“Given sufficient time, all firms will eventually sink to the default state. This 
behavior is clearly a mathematical artifact, stemming from the idealized linear, 
time invariant assumptions inherent in the simple Markov model. In reality 
the economy (and hence the migration matrix) will change on time-scales far 
shorter than required to reach the idealized default steady-state proscribed by an 
assumed constant migration matrix” (Jafry and Schuermann, 2004, page 2609). 


°1The rows represent the initial rating whereas the columns indicate the final rating. 
92Not all Markov chains behave in this way, meaning that 1* does not necessarily exist. 
°3This concept plays an important role when designing stress scenarios (see Chapter 18). 
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TABLE 3.35: Two-year transition probability matrix P? (in %) 


AAA AA A BBB BB B CCC D 
AAA 86.20 12.02 1.47 0.18 0.11 0.01 0.00 0.00 
AA 1.17 84.59 12.23 1.51 0.18 0.22 0.07 0.02 
A 0.16 4.17 8447 9.23 1.31 0.51 0.04 0.11 
BBB 0.10 0.63 10.53 77.66 8.11 2.10 0.32 0.56 
BB 0.08 0.24 1.60 13.33 66.79 13.77 1.59 2.60 
B 0.01 0.21 0.61 1.29 11.20 70.03 5.61 11.03 
CCC 0.29 0.04 068 1.37 431 17.51 37.34 38.45 
D 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 


TABLE 3.36: Five-year transition probability matrix P (in %) 


AAA AA A BBB BB B CCC D 
AAA 69.23 23.85 5.49 0.96 0.31 0.12 0.02 0.03 
AA 2.35 66.96 24.14 4.76 0.86 0.62 0.13 0.19 
A 0.43 8.26 68.17 17.34 3.53 1.55 0.18 0.55 
BBB 0.24 1.96 19.69 56.62 13.19 5.32 0.75 2.22 
BB 0.17 0.73 5.17 21.23 40.72 20.53 2.71 8.74 
B 0.07 0.47 1.73 4.67 16.53 44.95 5.91 25.68 
CCC 0.38 0.24 1.37 2.92 7.13 18.51 9.92 59.53 
D 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 


We note that the survival function S; (t) of a firm whose initial rating is the state i is 
given by: 


S; (t) 1- Pr{ R(t) =K|R(0)=i} 

1— e; Pex (3.47) 
In the piecewise exponential model, we recall that the survival function has the following 
expression: 


s(t) =S(t# 


eS) e7 dm (t-t,-1) 
for t € ]t*,_,,t%,]. We deduce that S (t*,) = S (t*,_,) em (%m—!m—1), implying that: 
In S (th) = nS (t1) —Am (En — 1) 


and: 
= InS (thsi) —InS (t%,) 


tin — tm-1 


m 


It is then straightforward to estimate the piecewise hazard function: 
e the knots of the piecewise function are the years m € N*; 
e for each initial rating i, the hazard function A; (t) is defined as: 
Ai (t) = Aim if t € ]m—1,m] 
where: 
In S; (m — 1) — In S; (m) 


m—(m-—1) 


1—e] P™-1lex 
= In| = 
l-e Prex 


Nim = 


and P? = 7. 
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If we consider the credit migration matrix given in Table 3.34 and estimate the piecewise 
exponential model, we obtain the hazard function®* A; (t) shown in Figure 3.35. For good 
initial ratings, hazard rates are low for short maturities and increase with time. For bad 
initial ratings, we obtain the opposite effect, because the firm can only improve its rating if 
it did not default. We observe that the hazard function of all the ratings converges to the 
same level, which is equal to 102.63 bps. This indicates the long-term hazard rate of the 


Markov chain, meaning that 1.02% of firms default every year on average. 
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FIGURE 3.35: Estimated hazard function A; (t) from the credit migration matrix 


Continuous-time modeling We now consider the case t € Ry. We note P(s;t) the 
transition matrix defined as follows: 


Pij (s;t) = 


p(s, i; t, j) 
Pr {R (t) = j | R (s) = i} 


Assuming that the Markov chain is time-homogenous, we have P (t) = P (0; t). Jarrow et 
al. (1997) introduce the generator matrix A = (\;,;) where A; j > 0 for all i # j and: 


K 
Mi=- A 


ji 


In this case, the transition matrix satisfies the following relationship: 


P (t) = exp (tA) (3.48) 


94Contrary to what the graph suggests, ; (t) is a piecewise constant function (see details of the curve in 
the fifth panel for very short maturities). 
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where exp (A) is the matrix exponential of A. Let us give a probabilistic interpretation of 
A. If we assume that the probability of jumping from rating į to rating j in a short time 
period At is proportional to At, we have: 


The matrix form of this equation is P (t;t + At) = A At. We deduce that: 
P(t+At) = P(t)P(t;t+At) 
P(t) AAt 
and: 
dP (t) = P(t) Adt 


Because we have exp (0) = I, we obtain the solution P(t) = exp (tA). We then interpret 
Ai j as the instantaneous transition rate of jumping from rating 7 to rating j. 


Remark 43 In Appendix A.1.1.3, we present the matrix exponential function and its math- 
ematical properties. In particular, we have e4+® = e4e? and eACt+) = e^se^t where A 
and B are two square matrices such that AB = BA and s and t are two real numbers. 


Example 39 We consider a rating system with three states: A (good rating), B (bad rating) 
and D (default). The Markov generator is equal to: 


—0.30 0.20 0.10 
A= 0.15 —0.40 0.25 
0.00 0.00 0.00 


The one-year transition probability matrix is equal to: 


75.16% 14.17% 10.67% 
P(1)=e4=| 10.63% 68.07% 21.30% 
0.00% 0.00% 100.00% 


For the two-year maturity, we get: 


58.00% 20.30% 21.71% 
P(2)=e^= | 15.22% 47.85% 36.93% 
0.00% 0.00% 100.00% 


We verify that P (2) = P (1). This derives from the property of the matrix exponential: 
P(t) =e = (A = P(t)’ 


The continuous-time framework allows to calculate transition matrices for non-integer ma- 
turities, which do not correspond to full years. For instance, the one-month transition prob- 
ability matrix of the previous example is equal to: 


; 97.54% 1.62% 0.84% 
P(2)=e4 =| 1.21% 96.73% 2.05% 
0.00% 0.00% 100.00% 


One of the issues with the continuous-time framework is to estimate the Markov gen- 
erator A. One solution consists in using the empirical transition matrix P (t), which have 
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been calculated for a given time horizon t. In this case, the estimate A must satisfy the 
relationship Ê (t) = exp (tA). We deduce that: 


r 1 x 
Â= G 0) 
where In A is the matrix logarithm of A. However, the matrix A cannot verify the Markov 


conditions Mog > 0 for alli #7 and = Ai j = 0. For instance, if we consider the previous 


S&P transition matrix, we obtain the generator A given in Table 3.37. We notice that six off- 
diagonal elements of the matrix are negative®’. This implies that we can obtain transition 
probabilities which are negative for short maturities. In this case, Israel et al. (2001) propose 
two estimators to obtain a valid generator: 


1. the first approach consists in adding the negative values back into the diagonal values: 
Aij = max (i0) i#j 


Mig = Aig + 2j min Gus, 0) 


2. in the second method, we carry forward the negative values on the matrix entries 
which have the correct sign: 


Gi = Nii + parr max \,,j,0) 
By = Do jy; Max (Aes 0 

0 if i A j and À; j <0 
Vij = Nog =D; isl /Gi ifG;>0 

dig if G; =0 


Using the estimator A and the two previous algorithms, we obtain the valid generators given 
in Tables 3.39 and 3.40. We find that ||P — exp (Ā)||_ = 11.02 10-4 and ||P — exp (ã)||_ = 
L 1 


10.95 x 10-*, meaning that the Markov generator A is the estimator that minimizes the 
distance to P. We can then calculate the transition probability matrix for all maturities, 
and not only for calendar years. For instance, we report the 207-day transition probability 


f 207\ _ 207 -\. 
matrix P (=) = exp (FA) in Table 3.41. 


Remark 44 The continuous-time framework is more flexible when modeling credit risk. 
For instance, the expression of the survival function becomes: 


S; (t) = Pr {R (t) = K | R (0) =i} = 1 — ef exp(tA) ex 
We can therefore calculate the probability density function in an easier way: 
fi (t) = —ô; S; (t) = el A exp (tA) eK 


For illustration purposes, we represent the probability density function of SBP ratings esti- 
mated with the valid generator A in Figure 3.36. 


°5We have also calculated the estimator described in Israel et al. (2001): 
= (P-1)" 


he ey — 


n=1 


We do not obtain the same matrix as for the estimator A, but there are also six negative off-diagonal 
elements (see Table 3.38). 
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TABLE 3.37: Markov generator A (in bps) 
AAA AA A BBB BB B CCC D 
AAA —747.49 703.67 35.21 3.04 6.56 —0.79 —0.22 0.02 
AA 67.94 —859.31 722.46 51.60 2.57 10.95 4.92 —1.13 
A 7.69 245.59 —898.16 567.70 53.96 20.65 —0.22 2.80 
BBB 5.07 21.53 650.21 —1352.28 557.64 85.56 16.08 16.19 
BB 4.22 10.22 41.74 930.55 —2159.67 999.62 97.35 75.96 
B —0.84 11.83 30.11 8.71 818.31 —1936.82 539.18 529.52 
CCC 25.11 —2.89 44.11 84.87 272.05 1678.69 —5043.00 2941.06 
D 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 
TABLE 3.38: Markov generator A (in bps) 
AAA AA A BBB BB B CCC D 
AAA —745.85 699.11 38.57 2.80 6.27 —0.70 —0.16  —0.05 
AA 67.54 —855.70 716.56 54.37 2.81 10.81 4.62 —1.01 
A 7.77 243.62 —891.46 560.45 56.33 20.70 0.07 2.53 
BBB 5.06 22.68 641.55 —1335.03 542.46 91.05 16.09 16.15 
BB 4.18 10.12 48.00 903.40 —2111.65 965.71 98.28 81.96 
B —0.56 11.61 29.31 19.39 789.99 —1887.69 491.46 546.49 
CCC 23.33 —1.94 42.22 81.25 272.44 1530.66 —4725.22 2777.25 
D 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 
TABLE 3.39: Markov generator A (in bps) 
AAA AA A BBB BB B CCC D 
AAA —748.50 703.67 35.21 3.04 6.56 0.00 0.00 0.02 
AA 67.94 —860.44 722.46 51.60 2.57 10.95 4.92 0.00 
A 7.69 245.59 —898.38 567.70 53.96 20.65 0.00 2.80 
BBB 5.07 21.53 650.21 —1352.28 557.64 85.56 16.08 16.19 
BB 4.22 10.22 41.74 930.55 —2159.67 999.62 97.35 75.96 
B 0.00 11.83 30.11 8.71 818.31 —1937.66 539.18 529.52 
CCC 25.11 0.00 44.11 84.87 272.05 1678.69 —5045.89 2941.06 
D 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 
TABLE 3.40: Markov generator A (in bps) 
AAA AA A BBB BB B CCC D 
AAA —747.99 703.19 35.19 3.04 6.55 0.00 0.00 0.02 
AA 67.90 —859.88 721.98 51.57 2.57 10.94 4.92 0.00 
A 7.69 245.56 —898.27 567.63 53.95 20.65 0.00 2.80 
BBB 5.07 21.53 650.21 —1352.28 557.64 85.56 16.08 16.19 
BB 4.22 10.22 41.74 930.55 —2159.67 999.62 97.35 75.96 
B 0.00 11.83 30.10 8.71 818.14 —1937.24 539.06 529.40 
CCC 25.10 0.00 44.10 84.84 271.97 1678.21 —5044.45 2940.22 
D 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 
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TABLE 3.41: 207-day transition probability matrix (in %) 


AAA AA A BBB BB B CCC D 
AAA 95.85 3.81 0.27 0.03 0.04 0.00 0.00 0.00 
AA 0.37 95.28 3.90 0.34 0.03 0.06 0.02 0.00 
A 0.04 1.33 95.12 3.03 0.33 0.12 0.00 0.02 
BBB 0.03 0.14 3.47 92.75 2.88 0.53 0.09 0.11 
BB 0.02 0.06 0.31 4.79 88.67 5.09 0.53 0.53 
B 0.00 0.06 0.17 0.16 4.16 89.84 2.52 3.08 
CCC 0.12 0.01 0.23 0.45 1.45 7.86 75.24 14.64 
D 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 


3.3.3.3 Structural models 


The previous approaches are purely statistical and are called reduced-form models. We 
now consider economic models for modeling default times. These approaches are based on 
accounting and market data and are called structural models. 


The Merton model The structural approach of credit risk has been formalized by Mer- 
ton (1974). In this framework, the bond holders will liquidate the corporate firm if the asset 
value A (t) goes below a threshold B related to the total amount of debt. The underlying 
idea is that bond holders monitor the asset value and compare A (t) to the default barrier 
B. 


Merton (1974) assumes that the dynamics of the assets A (t) follows a geometric Brow- 
nian motion: 


dA (t) = ua A (t) dt + 044A (t) dW (t) 
where A (0) = Ao. The default occurs if the asset value A (t) falls under the threshold B: 


T :=inf {t: A(t) < B} 


In this case, the bond holders receive A (T), and lose B — A (T). The payoff of bond holders 
is then equal to: 
D = B — max (B — A (T) ,0) 


where D is the debt value of maturity T. The holding of a risky bond can be interpreted as 
a trading strategy where we have bought a zero-coupon and financed the cost by selling a 
put on A(t) with an exercise price B and a maturity T. From the viewpoint of the equity 
holders, the payoff is equal to max (A (T) — D, 0). The holding of an equity share can be 
interpreted as a trading strategy where we have bought a call option with a strike equal to 
the debt value D. It follows that the current value Eo of the equity is: 


Ey = e "? -E[max (A(T) —D,0)| 
Ag® (dı) = eT D® (d2) 


where: 


InA4p-InD+rT 1 
= + -0o VT 
oAaAvT 2 $ 


and dg = dı — 0 av T. We notice that the equity value depends on the current asset value 
Ag, the leverage ratio L = D/Ao, the asset volatility o4 and the time of repayment T. 


dı 
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FIGURE 3.36: Probability density function f; (t) of S&P ratings 


The KMV implementation In the nineties, the Merton model has been implemented 
by KMV°% with a lot of success. The underlying idea of the KMV implementation is to 
estimate the default probability of a firm. One of the difficulties is to estimate the asset 
volatility o4. However, Jones et al. (1984) show that it is related to the equity volatility 
og. Indeed, we have E (t) = C (t, A (t)), implying that: 


dE (t) = 0,C (t, A(t)) dt+ wasA (t) aC (t, A (t)) dt + 
1 
wae (t) O4C (t, A(t)) dt + oA (t) OaC (t, A (t)) dW (t) 
Since the stochastic term is also equal to ogE (t) dW (t), we obtain the following equality 


at time t = 0: 
onEo = oa Ag® (dı) 


Therefore, Crosbie and Bohn (2002) deduce the following system of equations: 


{ Ao® (di) — e~"? D@ (dz) — Eo = 0 (3.49) 


ork = aa Aop® (dı) =0 
Once we have estimated Ag and g4, we can calculate the survival function: 
S(t) = Pr{A(t) > D |A(0)= Ao} 


lno- lnD+puat 1 ) 
= © + =oavt 
( oavt 2^ 


and deduce the probability of default F (t) = 1 — S (t) and the distance to default DD (t) = 
~ (S (¢)). 


96KMV was a company dedicated to credit risk modeling, and was founded by Stephen Kealhofer, John 
McQuown and Oldrich Vasicek. In 2002, they sold KMV to Moody’s. 
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Example 40 Crosbie and Bohn (2002) assume that the market capitalization Eo of the 
firm is $3 bn, its debt liability D is $10 bn, the corresponding equity volatility om is equal 
to 40%, the maturity T is one year and the expected return ua is set to 7%. 


Using an interest rate r = 5% and solving Equation (3.49), we find that the asset value Ao 
is equal to $12.512 bn and the implied asset volatility a4 is equal to 9.609%. Therefore, we 
can calculate the distance-to-default DD (1) = 3.012 and the one-year probability PD (1) = 
12.96 bps. In Figure 3.37, we report the probability of default for different time horizons. 
We also show the impact of the equity volatility og and the expected return ua, which can 
be interpreted as a return-on-equity ratio (ROE). We verify that the probability of default 
is an increasing function of the volatility risk and a decreasing function of the profitability. 


120 
—— o=407, ua=77 
===: 0p=427, Un=/7Z 
| = — — of=407, ua=67 
00 - a TE HA 


PD (in bps) 


t (in years) 


FIGURE 3.37: Probability of default in the KMV model 


Remark 45 The KMV model is more complex than the presentation above. In particular, 
the key variable is not the probability of default, but the distance-to-default (see Figure 3.38). 
Once this measure is calculated, it is converted into an expected default frequency (EDF) 
by considering an empirical distribution of PD conditionally to the distance-to-default. For 
instance, DD (1) = 4 is equivalent to PD (1) = 100 bps (Crosbie and Bohn, 2002). 


The CreditGrades implementation The CreditGrades approach is an extension of 
the Merton model, uses the framework of Black and Cox (1976) and has been developed by 
Finkelstein et al. (2002). They assume that the asset-per-share value A(t) is a geometric 
Brownian motion without drift: 


dA (t) = o4 A (t) dW (t) 


whereas the default barrier B is defined as the recovery value of bond holders. B is equal to 
the product R - D, where R € [0,1] is the recovery rate and D is the debt-per-share value. 
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FIGURE 3.38: Distance-to-default in the KMV model 


They also assume that R and A (t) are independent and R ~ LN (uR,oR). We recall that 
the default time is defined by: 


7T:=inf {t>0:teD} 


where D = {A (t) < B}. Since we have A (t) = Ape74W~-4t/2 and B = De#R+°R© where 
e ~ N (0,1), it follows that: 


D= { Aoe7aW 0-0412 < Dee} 


The authors introduce the average recovery rate R = E [R] = etrt+or/2 We deduce that: 


D= { Aer aW O- 24/2 < RDerre ore! 
= Taa a < RD} (3.50) 
Finkelstein et al. (2002) introduce the process X (t) defined by: 


2 


X (t)=0o4W (t) ot — ORE JIR 


2 
It follows that Inequality (3.50) becomes: 
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By assuming that X (t) can be approximated by a geometric Brownian motion with drift 
—o7,/2 and diffusion rate 74, we can show that’: 


a(t) ny a(t) my 
S(t)=©® } P 
mea e eal a 
where o (t) = \/o4t+o% and: 
i Ape?R 
RD 
This survival function is then calibrated by assuming that Aj = So + RD and: 
S* 
7A OS RD 


or historical) volatility. All the parameters (So, S*, os, R, D) are easy to calibrate, except 
the volatility of the recovery rate op. We have: 


where So is the current stock price, S* is the reference stock price and og is the stock (implied 


a2 = var (In R) = var (In B) 


We deduce that og is the uncertainty of the default barrier B. 
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FIGURE 3.39: Probability of default in the CreditGrades model 


97 By considering the reflection principle and Equation (A.24) defined on page 1074, we deduce that: 


Pr {infs<: pus + oW (s) > c} = ğ ( 


ptt+e 
avt 


ut— c 


avt 


p (i 


The expression of S (t) is obtained by setting u = —o7 /2, o =o, and c = ln (RD) — ln (Aoe’®), and 


using the change of variable u = t + a2, /o?.. 
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In Figure 3.39, we illustrate the CreditGrades model by computing the probability of 
default when Sy = 100, S* = 100, os = 20%, R = 50%, or = 10% and D = 100. We notice 
that PD (t) is an increasing function of S*, og, R, and or. The impact of the recovery 
rate may be curious, but bond holders may be encouraged to cause the default when the 
recovery rate is high. 


Relationship with intensity (or reduced-form) models Let A(s) be a positive con- 
tinuous process. We define the default time by 7 := inf f: >0: i à(s) ds > o} where @ is 
a standard exponential random variable. We have: 


S (t) Pr{r >t} 


Pef f at) as <o) 
_ [exp -fao as)| 


Let A(t) = i Aà (s) ds be the integrated hazard function. If A (s) is deterministic, we obtain 
S (t) = exp (—A(#)). In particular, if À (s) is a piecewise constant function, we obtain the 
piecewise exponential model. 


II 
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FIGURE 3.40: Intensity models and the default barrier issue 


We now consider the stochastic case À (t) = oW? (t) where W (t) is a Brownian motion. 
In Figure 3.40, we illustrate the simulation mechanism of defaults. First, we simulate the 
exponential variable B. In our example, it is equal to 1.157. Second, we simulate the Brow- 
nian motion W (t) (top/left panel). Then, we calculate \(t) where o = 1.5% (top/right 
panel), and the integrated hazard function A (t) (bottom/left panel). Finally, we determine 
the default time when the integrated hazard function crosses the barrier B. In our example, 
T is equal to 3.30. In fact, the simulation mechanism may be confusing. Indeed, we have the 
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impression that we know the barrier B, implying that the default is predictable. In intensity 
models, this is the contrary. We don’t know the stochastic barrier B, but the occurrence of 
the default unveils the barrier B as illustrated in the bottom/right panel in Figure 3.40. In 
structural models, we assume that the barrier B is known and we can predict the default 
time because we observe the distance to the barrier. Intensity and structural models are 
then the two faces of the same coin. They use the same concept of default barrier, but its 
interpretation is completely different. 


3.3.4 Default correlation 


In this section, we consider the modeling of default correlations, which corresponds 
essentially to two approaches: the copula model and the factor model. Then, we see how to 
estimate default correlations. Finally, we show how to consider the dependence of default 
times in the pricing of basket derivatives. 


3.3.4.1 The copula model 


Copula functions are extensively studied in Chapter 11, and we invite the reader to 
examine this chapter before to go further. Let F be the joint distribution of the random 
vector (X1,..., Xn), we show on page 719 that F admits a copula representation: 


F (z1,..., £n) = C (F; (a1) p04, En (fa) 


where F; is the marginal distribution of X; and C is the copula function associated to F. 
Since there is a strong relationship between probability distributions and survival functions, 
we can also show that the survival function S of the random vector (T1, ..., Tn) has a copula 
representation: ' 

S (t1,..-,tn) = C (Si (t1),-.--, Sn (tn)) 


where S; is the survival function of 7; and C is the survival copula associated to S. The 
copula C is unique if the marginals are continuous. The copula functions C and C are not 
necessarily the same, except when the copula C is radially symmetric (Nelsen, 2006). This 
is for example the case of the Normal (or Gaussian) copula and the Student’s t copula. 
Since these two copula functions are the only ones that are really used by professionals”, 
we assume that C = C in the sequel. 


The Basel model We have seen that the Basel framework for modeling the credit risk 
is derived from the Merton model. Let Z; ~ N (0,1) be the (normalized) asset value of the 
it? firm. In the Merton model, the default occurs when Z; is below a non-stochastic barrier 
B;: 

The Basel Committee assumes that Z; = \/pX + /1— pe; where X ~ N (0,1) is the 
systematic risk factor and e; ~ N (0,1) is the specific risk factor. We have shown that the 


default barrier B; is equal to ®~! (p;) where p; is the unconditional default probability. We 
have also demonstrated that the conditional default probability is equal to: 


°8They can also use some Archimedean copulas that are not radially symmetric such as the Clayton 
copula, but it generally concerns credit portfolios with a small number of exposures. 
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Remark 46 In the Basel framework, we assume a fixed maturity. If we introduce the time 
dimension, we obtain: 


p(t) = Pr{rj<t} 
1— S; (t) 


and: 


ð- (1-S; oaa 
JI=p 


where S; (t) is the survival function of the i‘ firm. 


ntx =e 


The vector of assets Z = (Z1,..., Zn) is Gaussian with a constant covariance matrix 
C = C, (p): 
Lop, ee p 
eu + : 
; -P 
p> p 1 


It follows that the joint default probability is: 


Pham = Pr{D; =1,...,Dn=1 
Pr{Z, < B4,... Zn < 


} 
; Ba} 
®(Bi,...,Bn:C) 


II 


Since we have B; = 6! (p;), we deduce that the Basel copula between the default indicator 
functions is a Normal copula, whose parameters are a constant correlation matrix Cn (p): 


Pin = BO Gey On) SC) 
C (pı, --- , Pn; Cn (P)) 


II 


Let us now consider the dependence between the survival times: 


II 


S(ti,..-,tn) Pr{7 > t1,..-;7 > tn} 
= Pr{Z, >97} (pı (t1)),---, Zn > DT! (Pn (tn))} 
C (1 — pı (t1),---,1 — Pn (tn); ©) 


= C(Si (t1),---,Sn (tn); Cn (p)) 


II 


The dependence between the default times is again the Normal copula with the matrix of 
parameters C, (p). 


Extension to other copula models The Basel model assumes that the asset correlation 
is the same between the different firms. A first extension is to consider that the dependence 
between the default times remain a Normal copula, but with a general correlation matrix: 


Ll p2 ° Pin 


C= (3.51) 
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This approach is explicitly proposed by Li (2000), but it was already implemented in Cred- 
itMetrics (Gupton et al., 1997). The correlation matrix can be estimated using a structural 
model or approximated by the correlation of stock returns. However, this approach is only 
valid for publicly traded companies and is not always stable. This is why professionals prefer 
to use direct extensions of the one-factor model. 


Let X; be a Gaussian factor where j = 1,...,m. We assume that the asset value Z; 
depends on one of these common risk factors: 


Zi = pares, + &j (3.52) 


j=l 


with De 1{6;,; > 0} = 1. We assume that the common risk factors are correlated with 
each other, but they are independent of the specific risks (€1,...,€n), which are by definition 
not correlated. For instance, X; can represent the systematic risk factor of the j*® sector 
or industry. Of course, we can extend this approach to a higher dimension such as sector 
x region. For example, if we consider three sectors (S1, S2 and S3) and two geographical 
regions (R; and R2), we obtain six common risk factors: 


These risk factors can then be seen as composite sectors. We note map (i) the mapping 
function, which indicates the composite sector j (or the risk factor j): map (i) = j if i € X}. 
We assume that the dependence between the default times (71,...,7) is a Normal copula 
function, whose correlation matrix C is equal to: 


1 p(map(1),map(2)) --- — p(map (1), map (n)) 


C= 1 ; (3.53) 


p (map (n a, , map (n)) 


In practice, we have m < n and many elements of the correlation matrix C are equal. In 
fact, there are only m x (m + 1) /2 different values, which correspond to inter-sector and 
intra-sector correlations. 


Example 41 Let us consider the case of four sectors: 


Factor Xı Xə X3 X4 


Xı 30% 20% 10% 0% 


Xə 40% 30% 20% 
Xs 50% 10% 
Xa 60% 


The inter-sector correlations are indicated in bold, whereas the intra-sector correlations are 
underlined. 


If the portfolio is composed of seven loans of corporate firms that belong to the following 
sectors: 


5 6 7 
3 3 4 
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we obtain the following correlation matrix: 


1.00 0.30 0.20 0.10 0.10 0.10 0.00 
1.00 0.20 0.10 0.10 0.10 0.00 
1.00 0.30 0.30 0.30 0.20 


C= 1.00 0.50 0.50 0.10 
1.00 0.50 0.10 

1.00 0.10 

1.00 


Simulation of copula models With the exception of the Normal copula with a constant 
correlation matrix and an infinitely fine-grained portfolio, we cannot calculate analytically 
the value-at-risk or the expected shortfall of the portfolio loss. In this case, we consider 
Monte Carlo methods, and we use the method of transformation for simulating copula 
functions’’. Since we have S; (T;) ~ Uo,1;, the simulation of correlated default times is 
obtained with the following algorithm: 


1. we simulate the random vector (u1,..., Un) from the copula function C; 
2. we set T; = S7* (u;). 


In many cases, we don’t need to simulate the default time 7;, but the indicator function 
D; (ti) = 1{7; < ti}. Indeed, D; is a Bernoulli random variable with parameter F; (t) = 
1 — S; (t), implying that D (t) = (Di (t1),...,Dn (tn)) is a Bernoulli random vector with 
parameter p (t) = (pı (t1),.--, Pn (tn)). Since the copula of D (t) is the copula of the random 


vector T = (T1,---,7n), we obtain the following algorithm to simulate correlated indicator 
functions: 
1. we simulate the random vector (u1,..., Un) from the copula function C; 


2. we set D; (ti) = 1 if w > S; (ti). 


In the case of the Normal copula, the simulation of u = (u1,..., Un) requires calculating 
the Cholesky decomposition of the correlation matrix C. However, this approach is valid for 
a small size n of the credit portfolio, because we are rapidly limited by the memory storage 
capacity of the computer. In a 32-bit computer, the storage of a double requires 8 bytes, 
meaning that the storage of a n x n Cholesky matrix requires 78.125 KB if n = 100, 7.629 
MB if n = 1000, 762.94 MB if n = 10000, etc. It follows that the traditional Cholesky 
algorithm is not adapted when considering a large credit portfolio. However, if we consider 
the Basel model, we can simulate the correlated default times using the following [BASEL] 
algorithm: 


1. we simulate n + 1 Gaussian independent random variables X and (€1,...,€n); 


2. we simulate the Basel copula function: 
(ui)... Un) = (è (vax + v= per) 2,8 (Vax + VJI- pen) ) (3.54) 


3. we set T; = S7* (us). 


99See Section 13.1.3.2 on page 802. 
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The [BASEL] algorithm is the efficient way to simulate the one-factor model and demon- 
strates that we don’t always need to use the Cholesky decomposition for simulating the 
Normal (or the Student’s t) copula function. Let us generalize the [BASEL] algorithm when 
we consider the Normal copula with the correlation matrix given by Equation (3.53). The 
eigendecomposition of C is equal to VAV! , where V is the matrix of eigenvectors and A is 
the diagonal matrix of eigenvalues. Let u be a vector of n independent Gaussian standard- 
ized random numbers. Then, Z = VA!/?u is a Gaussian vector with correlation C. We note 
C* = (p%, ja) the m x m eorilation matrix based on intra- and inter-sector correlations!°° 
and we consider the corresponding eigendecomposition C* = V*A*V*'. Let X* bea mx 1 
Gaussian standardized random vector. It follows that the random vector Z = (Zj,..., Zn) 
is a Gaussian random vector with correlation matrix C = map (C*) where’!: 


ie P Aios AG + y/1— Papl) mapti: 


and A* = V* (A+) * and V* are the Lə-normalized eigenvectors. The [EIG] algorithm 
proposed by Jouanin et al. (2004) consists then in replacing the second step of the [BASEL] 
algorithm: 


1. we simulate n + m Gaussian independent random variables (Xj,...,X*,) and 
(E1, nis En); 


2. for the i*” credit, we calculate: 


Zi = Jo Ahm Taa e T (aias) 


3. we simulate the copula function: 
(u1, ..., Un) = (@ (Z1), ...,® (Zn)) 
4. we set T; = S7" (us). 


Here is a comparison of the efficiency of the [EIG] algorithm with respect to the traditional 
[CHOL] algorithm: 


. Matrix Random Number of operations 
Algorithm ,. i 
dimension numbers + x 

CHOL nxn n n x (n-— 1) nxn 

EIG mxm n+m nx(m+1) nx(m+1) 

10000 loans + 20 sectors 

CHOL 108 10 000 ~ 108 108 

EIG 400 10 020 2.1 x 10° 2.1 x 10° 


These results explain why the [EIG] algorithm is faster than the [CHOL] algorithm!°?. We 
also notice that the [EIG] algorithm corresponds to the [BASEL] algorithm in the case m = 1 
when there is only one common factor. 


100The diagonal elements correspond to intra-sector correlations, whereas the off-diagonal elements corre- 
spond to inter-sector correlations. 

101 Jouanin et al. (2004) showed that if the eigenvalues of C* are positive, then C = map (C*) is a correlation 
matrix. 

102On average, the computational time is divided by a factor of n/m. 
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Let us consider Example 41. We obtain: 


—0.2633 0.1302 —0.3886 0.2504 
—0.5771 —0.1980 —0.1090 0.1258 
—0.5536 0.0943 0.3281 0.2774 
—0.4897 0.0568 —0.0335 —0.5965 


At = 


We deduce that the second step of the [EIG] algorithm is: 


e if the credit belongs to the first sector, we simulate Z; as follows: 


Zi = —0.263 - Xf} — 0.130 - XZ + 0.389 - X3 + 0.250 - XT + 0.837 - €; 


e if the credit belongs to the second sector, we simulate Z; as follows: 


Zi = —0.577 - Xf — 0.198 - XF — 0.109 - XF + 0.126 - X% + 0.775- €; 


e if the credit belongs to the third sector, we simulate Z; as follows: 


Zi = —0.554 - X¥ + 0.094 - X} + 0.328 - X¥ + 0.277 - X% + 0.707- €; 


e if the credit belongs to the fourth sector, we simulate Z; as follows: 


Zi = —0.490 - X7 + 0.057 - X3 — 0.034 - X3 — 0.597 - Xï + 0.632 - £; 


Remark 47 The extension to the Student’s t copula is straightforward, because the multi- 
variate Student’s t distribution is related to the multivariate normal distribution!®°. 


3.3.4.2 The factor model 


In the previous paragraph, the multivariate survival function writes: 
S (t1,..-,tn) = C (Si (t1),---, Sn (tn); ©) 


where C is the Normal copula and C is the matrix of default correlations. In the sector 
approach, we note C = map (C*) where map is the mapping function and C* is the matrix 
of intra- and inter-correlations. In this model, we characterize the default time by the 
relationship T; < t & Z; < B; (t) where Z; = Dai Ahapli) jX] t y! — Phaap(i), map(i)Ei and 
B; (t) = @—1 (PD; (t)) = 87! (1 — S; (t)). 

The risk factors X7 are not always easy to interpret. If m = 1, we retrieve Z; = \/p- 
X +vV1 — p- £;i where p is the uniform correlation and X is the common factor. It generally 
corresponds to the economic cycle. Let us consider the case m = 2: 


gee A 
P P2 


where pı and p2 are the intra-sector correlations and p is the inter-sector correlation. We 
have: 


Zi = Arpaia € Xft + Arnap(i),2 . X3 + 1 = Pmap(i) : Ei 


It is better to consider the following factor decomposition: 


Z= VP X+ Pmap(i) — P: Xmap(i) + V L= Pmap(i) ` Ei (3.56) 


103See pages 737 and 1055. 
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In this case, we have three factors, and not two factors: X is the common factor, whereas 
X, and X> are the two specific sector factors. We can extend the previous approach to a 
factor model with m + 1 factors: 


Zi = VP -X+ Pmap(i) — P: X map(i) F V p= Pmap(i) ` Ei (3.57) 


Equations (3.56) and (3.57) are exactly the same, except the number of factors. However, 
the copula function associated to the factor model described by Equation (3.57) is the 
copula of the sector model, when we assume that the inter-sector correlation is the same 
for all the sectors, meaning that the off-diagonal elements of C* are equal. In this case, we 
can use the previous decomposition for simulating the default times. This algorithm called 
[CISC] (constant inter-sector correlation) requires simulating one additional random number 
compared to the [EIG] algorithm. However, the number of operations is reduced". 


Let 7 and T2 be two default times, whose joint survival function is S (t1,t2) = 
C (Sı (tı) S2 (t2)). We have: 


Sı (t | T2 = t*) = Pr{t > t| T2 =t} 
82C (Sı (t) , S2 (t*)) 
— Con (Si (t) ; S2 (t*)) 


where Cy); is the conditional copula function’. If C 4 C+, the default probability of one 
firm changes when another firm defaults (Schmidt and Ward, 2002). This implies that the 
credit spread of the first firm jumps at the default time of the second firm. This phenomenon 
is called spread jump or jump-to-default (JTD). Sometimes it might be difficult to explain 
the movements of these spread jumps in terms of copula functions. The interpretation is 
easier when we consider a factor model. For example, we consider the Basel model. Figures 
3.41 to 3.45 show the jumps of the hazard function of the S&P one-year transition matrix 
for corporate bonds given in Table 3.34 on page 208. We recall that the rating R(t) = K 
corresponds to the default state and we note ‘A (t) = i the initial rating of the firm. We 
have seen that S; (t) = 1 — e] exp(tA)ex where A is the Markov generator. The hazard 
function is equal to: 


II 


Ai (t) = fi (t) _ e] Aexp (tA) ex 
S; (t) 1-—e; exp(tA)ex 
We deduce that: ae . 
; m= 
A (t| Ta =) = e 
a | 7; ) Si (t| T = t*) 


With the Basel copula, we have: 


. am ge | © (Sa ©) — 08 (Si, (t*)) 
S(t =e) =e JF ) 


and: 


: (= (Si (t)) — p07! (Sa en) 


104For the [EIG] algorithm, we have n x (m +1) operations (+ and x), while we have 3n elementary 
operations for the [CISC] algorithm. 
105The mathematical analysis of conditional copulas is given on page 737. 
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The reference to the factor model allows an easier interpretation of the jumps of the hazard 
rate. For example, it is obvious that the default of a CCC-rated company in ten years implies 
a negative jump for the well rated companies (Figure 3.45). Indeed, this indicates that the 
high idiosyncratic risk of the CCC-rated firm has been compensated by a good economic 
cycle (the common risk factor X). If the default of the CCC-rated company has occurred 
at an early stage, the jumps were almost zero, because we can think that the default is due 
to the specific risk of the company. On the contrary, if a AAA-rated company defaults, the 
jump would be particularly high as the default is sudden, because it is more explained by 
the common risk factor than by the specific risk factor (Figure 3.42). We deduce that there 
is a relationship between jump-to-default and default correlation. 
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FIGURE 3.41: Hazard function A; (t) (in bps) 


3.3.4.3 Estimation methods 


The Normal copula model with sector correlations requires the estimation of the matrix 
C*, which is abusively called the default correlation matrix. In order to clarify this notion, 
we make the following distinctions: 


e the ‘canonical or copula correlations’ correspond to the parameter matrix of the copula 
function that models the dependence between the defaults; 


e the ‘default time correlations’ are the correlations between the default times 
(T1,---,7n); they depend on the copula function, but also on the unidimensional 
survival functions; 


e the ‘discrete default correlations’ are the correlations between the indicator functions 
(Dj (t),..., Dn (t)); they depend on the copula function, the unidimensional survival 
functions and the time horizon t; this is why we don’t have a unique default correlation 
between two firms, but a term structure of default correlations; 
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FIGURE 3.42: Hazard function å; (t) (in bps) when a AAA-rated company defaults after 
10 years (p = 5%) 
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FIGURE 3.43: Hazard function A; (t) (in bps) when a AAA-rated company defaults after 
10 years (p = 50%) 
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FIGURE 3.44: Hazard function A; (t) (in bps) when a BB-rated company defaults after 
10 years (p = 50%) 
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FIGURE 3.45: Hazard function A; (t) (in bps) when a CCC-rated company defaults after 
10 years (p = 50%) 
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e the ‘asset correlations’ are the correlations between the asset values in the Merton 
model; 


e the ‘equity correlations’ are the correlations between the stock returns; in a Merton- 
like model, they are assumed to be equal to the asset correlations. 


In practice, the term ‘default correlation’ is used as a generic term for these different mea- 
sures. 


Relationship between the different default correlations We consider two firms. Li 
(2000) introduces two measures of default correlation. The discrete default correlation is 
equal to: 


[D1 (t1) Də (t2)| — E [Di (t1)] E [D2 (t2)] 
a (Dy (t1)) o (D2 (t2)) 
where D; (ti) = 1{7; < ti}, whereas the default (or survival) time correlation is equal to: 


p (tı, t2) = 


è [Ti T2] = a [Ti] è [Tə] 
a (T1) o (T2) 


P (Ti, T2) = 


These two measures give very different numerical results. Concerning the asset correlation, 
it is equal to: 


[Z1 Z2] — E [Z1] E [Z2] 

a (Z1) o (Z2) 
These three measures depend on the canonical correlation. Let us denote by p the copula 
parameter of the Normal copula between the two default times 7, and T2. We have: 


C (PD, (ti), PD. (t2); p)— PD, (tı) -PD2 (t2) 
y PD; (t1) (1 — PD: (t1)) - \/PD2 (t2) (1 — PD» (t2)) 


p(Zı, Z2) = 


p (ti, t2) = 


and: 
cov (T1, T2) 


P (Ti, T2) = 


y var (T1) - var (T1) 

where cov ae, 72) =, hx S (tı, to) — — S; (ti) S2 (t2)) dt, dts and var (Ti) =2 ho tS; (t) dt— 
J Sak at]? . We verify that p(t1,t2) 4 p and p (T1, T2) Æ p. We can also show that 
Sey : p and p(71,72) < p for the Normal copula. In the Basel model, we have 
P (Zi, Z2) =P. 

We consider two exponential default times 7, ~ E (A,) and T2 ~ E (Az). In Tables 3.42, 
3.43 and 3.44, we report the discrete default correlations p (t1, t2) for different time horizons. 
We notice that p (t1, t2) is much lower than 20%, which is the copula correlation. We have 
also calculated p (T1, T2), which is respectively equal to 17.0%, 21.5% and 18.0%. We notice 
that the correlations are higher for the Student’s t copula than for the Normal copula!°. 


Statistical inference of the default correlation In the case of a factor model, we 


have: 
Zit = B' Xis a ae elles eit 


where Zi is the standardized asset value of the i*” firm at time t and Xit is the standardized 
vector of risk factors at time t for the it! firm. We can then estimate the parameter 3 using 
OLS or GMM techniques. Let us consider the constant inter-sector correlation model: 


Zi = Jp "X+ Pmap(i) — P: X map(i) Da V 1— Pmap(i) * Ei 


106 This phenomenon is explained in the chapter dedicated to the copula functions. 
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TABLE 3.42: Discrete default correlation in % (A; = 100 bps, A2 = 50 bps, Normal copula 
with p = 20%) 


/t2| 1 2 3 4 5 10 25 50 
1 2.00 24 2.7 2.9 3.1 36 4.2 4. 
2 2.3 2.9 3.3 36 3.8 45 5.3 5.7 
3 26 3.2 3.6 40 4.2 50 60 6.5 
4 2.7 34 39 42 45 54 65 7.1 
5 2.9 36 4.1 45 48 5.7 69 7.5 
10 3.2 41 4.7 5.1 55 66 8.2 9.1 
25 34 45 5.1 5.7 6.1 7.5 96 10.9 
50 3.3 44 5.1 56 61 76 9.9 11.5 


TABLE 3.43: Discrete default correlation in % (Ai = 100 bps, A2 = 50 bps, Student’s t 
copula with p = 20% and v = 4) 


tı / t2 1 2 3 4 5 10 25 50 
1 13.9 14.5 14.5 14.3 14.0 12.6 98 7.2 
2 12.8 143 148 14.9 14.9 14.3 11.9 9.2 
3 11.9 13.7 14.5 14.9 15.1 15.0 13.1 10.4 
4 11.2 13.1 14.1 14.6 14.9 15.3 13.8 11.3 
5 10.6 12.6 13.7 14.3 14.7 15.4 14.3 11.9 
10 8.5 10.5 11.8 12.6 13.3 14.8 15.2 13.6 
25 55 7.2 83 92 99 11.9 14.0 14.3 
50 3.3 45 53 59 65 83 11.0 12.6 


TABLE 3.44: Discrete default correlation in % (A1 = 20%, A2 = 10%, Normal copula with 
p = 20%) 


/ ta} 1 2 3 4 5 10 25 50 
1 8.8 10.2 10.7 11.0 11.1 104 66 2.4 
2 94 11.0 11.8 12.1 12.3 11.9 7.9 3.1 
3 9.3 11.0 11.9 12.4 12.7 12. 86 3.4 
4 9.0 10.8 11.7 12.2 126 126 89 3.7 
5 8.6 10.4 11.3 11.9 12.3 124 9.0 3.8 
10 
25 
50 


63 7.8 87 9.3 9.7 10.3 8.1 3.7 
19 24 28 31 33 3.8 35 1.9 
02 03 03 03 04 05 0.5 0.3 
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The corresponding linear regression is: 


Zit = o: Xow BY Xe + \/1 — Pmap(i) ` €i,t 


where Xit is equal to e; © X;, X, is the set of the risk factors, which are specific to the 
sectors at time t and Xot is the common risk factor. We deduce that the estimation of p 
and p1,...,pPm are given by the following relationships: 6 = B? and p; = B? + ĝ2. 

A second approach is to consider the correlation between the default rates of homoge- 
neous cohorts!°’. This correlation converges asymptotically to the survival time correlation. 
Then, we have to inverse the relationship between the survival time correlation and the cop- 
ula correlation for estimating the parameters of the copula function. 

The third approach has been suggested by Gordy and Heitfield (2002). They consider 
the Basel model: Z; = yP: X + VI =p- £i, where X ~ H and e; ~ N (0,1). The default 
probability conditionally to X = x is equal to: 


pi (x; Bi, p) = ® E=) 
vi-p 
We note d; the number of defaulted firms and n; the total number of firms at time t. If we 
have a historical sample of default rates, we can estimate p using the method of maximum 
likelihood. Let £, (6) be the log-likelihood of the observation t. If we assume that there is 
only one risk class C (B; = B), the conditional number of defaults D is a binomial random 


variable: 
Nt 


Pr{D=a|xX=0}= (7 
t 


) pte B,p)™ (1—p(a;B,p))"— 
We deduce that: 


£,(0) = m f Pr{D =| X =2} dH (xz) 


- mf (re B, p)* (1-p(e; B," * dH (2) 


Generally, we consider a one-year time horizon for calculating default rates. Moreover, if we 
assume that the common factor X is Gaussian, we deduce that B = ®~'(PD) where PD 
is the one-year default probability for the risk class C. It follows that: 


£, (0) = in f (“tye (z; -! (PD) p” (1- p(z; -t (PD) pmo” d® (x) 


Therefore, we can estimate the parameter p. If there are several risk classes, we can assume 
that: 


a0) =i f (7) (a: Bp)" 0- ple Bp)" a0 (a) 


In this case, we have two parameters to estimate: the copula correlation p and the implied 
default barrier B. 

The underlying idea of this approach is that the distribution of the default rate depends 
on the default probability and the copula correlation. More specifically, the mean of the 
default rate of a risk class C is equal to the default probability of C whereas the volatility 
of the default rate is related to the default correlation. We introduce the notation: 


ot 


Nt 


fi 


107 Each cohort corresponds to a risk class. 
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FIGURE 3.46: Distribution of the default rate (in %) 


where f is the default rate at time t. We assume that the one-year default probability of 
C is equal to 20%. In Figure 3.46, we report the distribution of the one-year default rate 
for different values of p when the number of firms n; is equal to 1000. We also report some 
statistics (mean, standard deviation and quantile functions) in Table 3.45. By definition, 
the four probability distributions have the same mean, which is equal to 20%, but their 
standard deviations are different. If p = 0%, ø (f+) is equal to 1.3% while ø (ft) = 33.2% in 
the case p = 90%. 


TABLE 3.45: Statistics of the default rate (in %)) 


Qa (ft) 

P |A) oF) i% 10% 25% 50% 75% 90% 99% 
0% | 20.0 1.3 |171 184 19.1 20.0 208 21.6 23.0 
20% | 20.0 13.0 | 1.7 5.6 10.0 17.4 27.3 38.3 59.0 
50% | 20.0 21.7 | 00 O06 31 11.7 303 538 87.3 
90% | 20.0 33.2 | 0.0 0.0 0.0 0.4 263 88.2 100.0 


Example 42 We consider a risk class C, whose probability of default is equal to 200 bps. 
Over the last 20 years, we have observed the following annual number of defaults: 3, 1, 14, 
0, 33, 3, 53, 1, 4, 0, 1, 8, 7, 3, 5, 5, 0, 49, 0 and 7. We assume that the number of firms is 
equal to 500 every year. 


If we estimate the Basel model with the method of maximum likelihood by assuming 
that B = -t (PD), we obtain ô = 28.93%. If we estimate both the default correlation 
and the default barrier, we have 6 = 28.58% and B= —2.063, which is equivalent to a 
default probability of 195 bps. It is better to estimate the barrier if we don’t trust the 
default probability of the risk class because the estimation can be biased. For instance, if 
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we assume that PD = 100 bps, we obtain 6 = 21.82%, which is relatively lower than the 
previous estimate. 

The previous estimation method has been generalized by Demey et al. (2004) to the 
CISC model with several intra-sector correlations, but a unique inter-sector correlation. In 
Table 3.46, we report their results for the period between 1981 and 2002. We notice that 
the default correlations are relatively low between 7% and 36%. The largest correlations are 
observed for the sectors of energy, finance, real estate, telecom and utilities. We also notice 
some significant differences between the Basel model and the CISC model. 


TABLE 3.46: Estimation of canonical default correlations 


Sector CISC model Basel model 
Aerospace/ Automobile 11.2% 11.6% 
Consumer /Service sector 8.7% 7.5% 
Energy /Natural resources 21.3% 11.5% 
Financial institutions 15.7% 12.2% 
Forest /Building products 6.8% 14.5% 
Health 8.3% 9.2% 
High technology 6.8% 4.7% 
Insurance 12.2% 7.6% 
Leisure time/Media 7.0% 7.0% 
Real estate 35.9% 27.7% 
Telecom 27.1% 34.3% 
Transportation 6.8% 8.3% 
Utilities 18.3% 21.2% 
Inter-sector 6.8% Vv 


Source: Demey et al. (2004). 


Remark 48 There are very few publications on the default correlations. Moreover, they 
generally concern the one-year discrete default correlations p(1,1), not the copula correla- 
tion. For example, Nagpal and Bahar (2001) estimate p(ti,t2) for US corporates and the 
period 1981-1999. They distinguish the different sectors, three time horizons (1Y, 5Y and 
TY) and IG/HY credit ratings. Even if the range goes from —5.35% to 39.35%, they obtain a 
very low correlation on average. However, these results should be taken with caution, because 
we know that the default correlation has increased since the 2008 Global Financial Crisis 
(Christoffersen et al., 2017). 


3.3.4.4 Dependence and credit basket derivatives 


Interpretation and pitfalls of the Basel copula The Basel copula is the basic model 
for pricing CDO tranches, just as the Black-Scholes model is for options. We define the 
implied correlation as the parameter p that gives the market spread of the CDO tranche. 
In some sense, the implied correlation for CDOs is the equivalent of the implied volatility 
for options. Since the implied correlation depends on attachment and detachment points of 
the CDO tranche, we don’t have a single value, but a curve which is not flat. Therefore, we 
observe a correlation smile or skew, meaning that the correlation is not constant. 

In order to understand this phenomenon, we come back to the economic interpretation 
of the Basel model. In Figure 3.47, we report the mapping between the economic cycle and 
the common risk factor X. In this case, negative values of X correspond to bad economic 
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times whereas positive values of X correspond to good economic times. We notice that the 
factor model does not encompass the dynamics of the economic cycle. The Basel model is 
typically a through-the-cycle approach, and not a point-in-time approach, meaning that the 


time horizon is the long-run (typically an economic cycle of 7 years). 


Economic cycle 


Good economic times 
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FIGURE 3.47: Economic interpretation of the common factor X 
We recall that the loss function is L = X; EAD; - LGD; -1 {7; < T;}. Let A and D be 


the attachment and detachment points of the tranche. We have: 
[L| A< L< D|=Ex|L(X)|A<L<D] 


where L (X) is the conditional loss with respect to the common factor X. With this model, 
the pricing of a CDO tranche uses all the economic scenarios, which are equally weighted. 
In practice, we know that market participants are more sensitive to bad economic times and 
have a shorter time horizon than the duration of an economic cycle. From a mathematical 
point of view, this implies that the factor component ,/pX is certainly not Gaussian and 
symmetric about 0. Two directions have then been investigated in order to introduce skew- 
ness in credit risk modeling. The first approach assumes that the copula correlation p is not 
constant but stochastic, while the second approach states that the copula correlation is a 
function of the common factor X. These two approaches are two visions of the link between 


default correlations and the economic cycle. 
We consider an extension of the Basel model: 


Zi = VRiX + V1- Rici 


where R; € [0,1] is a random variable that represents the stochastic correlation (Andersen 
and Sidenius, 2005). We notice that the conditional process Z; | R; = p remains Gaussian, 


Stochastic correlation model 


whereas the conditional probability of default becomes: 
aes dG (p) 


n= f a( Aas 
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where G is the probability distribution of R;. Burtschell et al. (2007) propose to model the 
stochastic correlation R; as a binary random variable: 


Ri = (1—Yi) Voit Yı ymz 


where Y; is a Bernoulli random variable B (p). For example, if p = 5%, pı = 0% and 
p2 = 100%, the defaults are uncorrelated most of the time and perfectly correlated in 5% 
of cases. The copula of default times is then a mixture of the copula functions C+ and C+ 
as shown in Figure 3.48. From an economic point of view, we obtain a two-regime model. 
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FIGURE 3.48: Dependogram of default times in the stochastic correlation model 


Local correlation model In this model, we have: 


Zi = B(X) X +41- || (X) Ibe 


where the factor loading 6 (X) is a function of the factor X, meaning that 8 (X) depends 
on the position in the economic cycle. In Figure 3.49, we consider two functions: 6o (X) 
is constant (Basel model) and 81 (X) decreases with the common factor. In this last case, 
the factor loading is high in bad economic times, meaning that the default correlation 
p = B? (X) is larger at the bottom of the economic cycle than at its top. This implies that 
the latent variable Z; is not Gaussian and exhibits a skewness and an excess kurtosis. We 
verify this property on the normalized probability density function of the factor component 
B(X) X (bottom/right panel in Figure 3.49). This specification has an impact of the joint 
distribution of defaults. For example, we report the empirical copula of default times in 
Figure 3.50 when the factor loading is 81 (X). We notice that this copula function is not 
symmetric and the joint dependence of defaults is very high in bad economic times when 
the value of X is low. When £ (X) is a decreasing function of X, we observe a correlation 
skew. It is equivalent to change the probability measure in order to penalize the bad states 
of the economic cycle or to introduce a risk premium due to the misalignment between the 
time horizon of investors and the duration of the economic cycle. 
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p(X)=6(X) 
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FIGURE 3.49: Distribution of the latent variable Z in the local correlation model 
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FIGURE 3.50: Dependogram of default times in the local correlation model 
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To implement this model, we consider the normalization Z% = 07! (Zi — mz) where: 


+00 
mz =E[Z] = J B (x) d(x) da 


=o) 
and: 
+co 


oz =var(Zi) =f (1-5 (2) +5 (a) 2?) 6(a) de — m 
We notice that the probability distribution of the latent variable Zž is equal to: 
F*(z) = Pr{Z <z} 
= [oe (= Dii L ae) g(x) dx 
-%0 1— |i (x) [lp 


To simulate correlated defaults!°8, we use the inversion method such that U; = F* (Z;). 


We consider the following parametrization: 


_f1-(1-yp e3? ifs<0 
s@)={ 5 ife >0 


The function 6 (x) depends on two parameters p and a. The local correlation p (x) = 8? (x) 
is given in Figure 3.51. The parameter p represents the default correlation when the economic 
cycle is good or the common factor X is positive. We also notice that the local correlation 
p(x) tends to 1 when a tends to —oo. This implies an absolute contagion of the default 
times when the economic situation is dramatic. The parameter a is then a measure of the 
contagion intensity when the economic cycle is unfavorable. Figure 3.52 shows the base 
correlation!°? which are generated by this model'!°. We observe that these concave skews 
are coherent with respect to those observed in the market. 


In Figure 3.53, we report the base correlation of the 5Y European iTraxx index at 
the date of 14 June 2005. The estimation of the local correlation model gives p = 0.5% 
and a = 60%. We notice that the calibrated model fits well the correlation skew of the 
market. Moreover, the calibrated model implies an asymmetric distribution and a left fat 
tail of the factor component (top/right panel in Figure 3.54) and an implied economic cycle, 
which is more flattened than the economic cycle derived from a Gaussian distribution. In 
particular, we observe small differences within good economic times and large differences 
within bad economic times. If we consider the copula function, we find that defaults are 
generally weakly correlated except during deep economic crisis. Let us consider the ordinal 
sum of the two copula functions C+ and C+. This copula is represented in Figure 3.55. The 
10% worst economic scenarios correspond to the perfect dependence (copula Ct) whereas 
the remaining 90% economic scenarios correspond to the zero-correlation situation (copula 
C+). We notice that this copula function fits very well the correlation skew. We conclude 
that market participants underestimate default correlations in good times and overestimate 
default correlations in bad times. 


l08We calculate mz, oz and F* (z). For F* (z), we consider a meshgrid (z,). When z € (zk, zk+1), we use 
the linear or the Gaussian interpolation. 

109 The base correlation is the implied correlation of an equity tranche, where the attachment point is equal 
to 0 and the detachment point is equal to the strike. 

110We consider a CDO with a five-year maturity. The coupons are paid every quarter. The portfolio of 
underlying assets is homogenous with a spread of 100 bps and a recovery rate of 40%. The pricing is done 
with the method of Monte Carlo and one million simulations 
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FIGURE 3.52: Correlation skew generated by the local correlation model 
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FIGURE 3.53: Calibration of the correlation skew (local correlation model) 


FIGURE 3.54: Implied local correlation model 
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FIGURE 3.55: Calibration of the correlation skew (ordinal sum of C+ and C*) 


3.3.5 Granularity and concentration 


The risk contribution of the Basel model has been obtained under the assumption that 
the portfolio is infinitely fine-grained. In this case, the common risk factor X largely dom- 
inates the specific risk factors ¢;. When the portfolio is concentrated in a few number of 
credits, the risk contribution formula, which has been derived on page 173, is not valid. 
In this case, the Basel regulation implies to calculate an additional capital. In the second 
consultative paper on the Basel II Accord (BCBS, 2001a), the Basel Committee suggested 
to complement the IRB-based capital by a ‘granularity adjustment’ that captures the risk 
of concentration. Finally, the Basel Committee has abandoned the idea to calculate the ad- 
ditional capital in the first pillar. In fact, this granularity adjustment is today treated in the 
second pillar, and falls under the internal capital adequacy assessment process (ICAAP). 


3.3.5.1 Difference between fine-grained and concentrated portfolios 


Definition of the granularity adjustment We recall that the portfolio loss is given 
by: 
L= X_ EAD; LGD; -1 {7; < T;} (3.58) 
i=1 
Under the assumption that the portfolio is infinitely fine-grained (IFG), we have shown that 
the one-year value-at-risk is given by!!!: 


(3.59) 


VaRo, (wra) = 5 EAD; ; [LGD,] -ð (= (PD;) + pe! e2») 


vI-p 


However, this assumption does not always hold, and the portfolio w cannot be fine-grained 
and present some concentration issues. In this case, the one-year value-at-risk is equal to 


lll Without any loss of generality, we assume that T; = 1 in the sequel. 
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the quantile a of the loss distribution: 
VaRa (w) = Fz' (a) 


The granularity adjustment GA is the difference between the two risk measures. In the case 
of the VaR and UL credit risk measures, we obtain: 


GA = VaR, (w) — VaRa (wirc) 


In most cases, we expect that the granularity adjustment is positive, meaning that the 
IRB-based capital underestimates the credit risk of the portfolio. 


The case of a perfectly concentrated portfolio Let us consider a portfolio that is 
composed of one credit. We have: 


LE =EAD-LGD-1 {7 <T} 
Let G be the distribution function of the loss given default. It follows that: 
F; (£) = Pr{EAD-LGD-1{7r<T}< é} 


Since we have (= 0 & T > T, we deduce that Fz (0) = Pr{7 > T} =1-—PD. If 240, 
this implies that the default has occurred and we have: 


F,(€) = F,(0)+Pr{EAD-LGD < ¢|r<T} 
g 


The value-at-risk of this portfolio is then equal to: 


a+PD-1 
PD 
0 otherwise 


B rap-a= ( ) ifa>1—PD 


VaR oq (w) 


In figure 3.56, we consider an illustration when the exposure at default is equal to one. 
The first panel compares the value-at-risk VaR, (w) when LGD ~ U [0,1] and LGD = 
50%. Except for low default probabilities, VaR (w) is larger when the loss given default 
is stochastic than when the loss given default is set to the mean E[LGD]. The next panels 
also shows that the IRB value-at-risk VaR, (wrrq) underestimates the true value-at-risk 
VaR (w) when PD is high. We conclude that the granularity adjustment depends on two 
main factors: 


e the discrepancy between LGD and its expectation E [LGD]; 


e the specific risk that can increase or decrease!!” the credit risk of the portfolio. 


The diversification effect and the default correlation We also notice that the gran- 
ularity adjustment is equal to zero when the default correlation tends to one: 


lim VaRq (w) = VaRa (wira) 
pl 


112For instance, the true value-at-risk can be lower than the sum of IRB contributions for well-rated 
portfolios. 
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FIGURE 3.56: Comparison between the 99.9% value-at-risk of a loan and its risk contri- 
bution in an IFG portfolio 


Indeed, when p = 1, there is no diversification effect. To illustrate this property, we re- 
port the loss distribution of an infinitely fine-grained portfolio!!? in Figure 3.57. When the 
correlation is equal to zero, the conditional expected loss does not depend on X and we 
have: 


L=E[L| X] = EAD -LGD -PD 


When the correlation is different from zero, we have: 


[L| X] >E[L] for low values of X 
[L| X] <E[L] for high values of X 


Since the value-at-risk considers a bad economic scenario, it is normal that the value-at-risk 
increases with respect to p because E [L | X] is an increasing function of p in bad economic 
times. 

In Figure 3.58, we compare the normalized loss distribution! of non fined-grained, 
but homogenous portfolios. We notice that the loss distribution of the portfolio converges 
rapidly to the loss distribution of the IFG portfolio. It suffices that the number of credits is 
larger than 50. However, this result assumes that the portfolio is homogenous. In the case of 
non-homogenous portfolio, it is extremely difficult to define a rule to know if the portfolio 
is fine-grained or not. 


1 


113 This is a homogeneous portfolio of 50 credits with the following characteristics: EAD = 1, E [LGD] = 


50% and PD = 10%. 
114This is the loss of the portfolio divided by the number n of credits in the portfolio. 
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FIGURE 3.57: Loss distribution of an IFG portfolio 
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FIGURE 3.58: Comparison of the loss distribution of non-IFG and IFG portfolios 
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3.3.5.2 Granularity adjustment 


Monte Carlo approach The first approach to compute the granularity adjustment is to 
estimate the quantile F;' (a) of the portfolio loss using the Monte Carlo method. In Table 
3.47, we have reported the (relative) granularity adjustment, which is defined as: 


fI (a) — VaRa (wire) 


GA* = 
VaR a (wire) 


for different homogenous credit portfolios when EAD = 1. We consider different values of 
the default probability PD (1% and 10%), the size n of the portfolio (50, 100 and 500) and 
the confidence level a of the value-at-risk (90%, 99% and 99.9%). For the loss given default, 
we consider two cases: LGD = 50% and LGD ~ U [0,1]. For each set of parameters, we 
use 10 million simulations for estimating the quantile F;'(a) and the same seed for the 
random number generator!!° in order to compare the results. For example, when n = 50, 
PD = 10%, p = 10%, LGD ~ U [0,1] and a = 90%, we obtain GA* = 13.8%. This means 
that the capital charge is underestimated by 13.8% if we consider the IRB formula. We 
notice that the granularity adjustment is positive in the different cases we have tested. We 
verify that it decreases with respect to the portfolio size. However, it is difficult to draw 
other conclusions. For instance, it is not necessarily an increasing function of the confidence 
level. 


TABLE 3.47: Granularity adjustment GA* (in %) 


n ` 50 100 500, 50 100 500 
Parameters a !' LGD~Uoy | LGD = 50% 

7 90% 1138 74 16,125 68 12 
ar 99% 119.3 100 211133 62 12 
P=" 99.9% | 21.5 10.9 2.31122 69 1.6 

on ae «=| ONG 81 42 091 27 27 00 
eee 99% 1103 53 11! 67 41 06 
pre 999% 11.3 56 12; 65 28 0.6 

En ~~ 90% 143.7 23.5 5.0 760.1 20.1 4.0- 
pa 99% , 36.7 188 3.9329 19.6 3.7 
Pee 99.9% | 30.2 15.3 3.1 123.7 9.9 17 


Analytical approach Let w be a credit portfolio. We have the following identity: 


VaRo (w) = VaRo (wire) + VaRa (w) — VaRo (wire) (3.60) 
ee mamam 
Granularity adjustment GA 
The granularity adjustment is then the capital we have to add to the IRB value-at-risk in 
order to obtain the true value-at-risk (Wilde, 2001b; Gordy, 2003). Since we have seen that 


VaRa (wirc) is the conditional expected loss when the risk factor X corresponds to the 
quantile 1 — a, we obtain: 


VaRo (w) = VaRa (wira) + GA 
= EJL]|X = zra] + GA 


115See Chapter 13 on page 787. 
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where za = H~! (1 — a) and H(z) is the cumulative distribution function of X. In order 
to derive the expression of the granularity adjustment, we rewrite Equation (3.60) in terms 
of portfolio loss: 


L=E[L|X]+(L-E[L| X}]) 
Since we have VaR. (w) = F~! (a) where F (£) is the loss distribution, we deduce that: 


VaRa (w) = VaRa (L) 
VaR, (E [L | X] +n (L -E[Ł | X]))| 


n=1 


Emmer and Tasche (2005) consider the second-order Taylor expansion of the value-at-risk: 


VaRa (w) œ% VaR (E[L | X}) 4 


ae [L| X]+n(L—E[L| XJ) 


n=1 


ie VaRa (E [L| X]+n(L-E[L|X]) 
2 01? 


Under some assumptions (homogeneous portfolio, regularity of the conditional expected 
loss, single factor model, etc.), Wilde (2001b) and Gordy (2003) show that the second-order 
Taylor expansion reduces to!!®: 


7 1 d (h(a) v(2) 
VaRa (w) ~ H (£a) 2h (x) da ( Ozu (2) ) 


where h (x) is the probability density function of X, u(x) is the conditional expected loss 
function: 


L=La 


p(x) =E[L |X =a] 


and v (a) is the conditional variance function: 
v(z)=0° (L | X =z) 
Since u (£a) = VaRa (wire), we deduce that: 
VaRa (w) ~ VaRa (wirc) + GA 


where the granularity adjustment is equal to: 


oe 1 d ae) 


2h(x)dz \ suld) /|r-2, 
E lls ) 0? 4 (£a) 1 O,U (£a) ee j eA tal 
~ 2°" (O u(£a))? 2ôsh(Ta) 2 ~~ su (Ta) 


The granularity adjustment has been extensively studied!!”. Originally, the Basel Commit- 
tee proposed to include the granularity adjustment in the first pillar (BCBS, 2001a), but it 
has finally preferred to move this issue into the second pillar!!®. 


116Ty fact, we can show that the first derivative vanishes (Gouriéroux et al., 2000). If we remember the 
Euler allocation principle presented on page 105, this is not surprising since VaRa (E [L | X]) is the sum of 
risk contributions and already includes the first-order effects. In this case, it only remains the second-order 
effects. 

117 See for example Gordy (2003, 2004), Gordy and Marrone (2012), Gordy and Liitkebohmert (2013). The 
works of Wilde (2001a,b) and Emmer and Tasche (2005) are a good introduction to this topic. 

118See Exercise 3.4.7 on page 253 for a derivation of the original Basel granularity adjustment. 


Credit Risk 247 


3.4 Exercises 
3.4.1 Single- and multi-name credit default swaps 


1. We assume that the default time 7 follows an exponential distribution with parameter 
A. Write the cumulative distribution function F, the survival function S and the 
density function f of the random variable 7. How do we simulate this default time? 


2. We consider a CDS 3M with two-year maturity and $1 mn notional principal. The 
recovery rate R is equal to 40% whereas the spread s is equal to 150 bps at the 
inception date. We assume that the protection leg is paid at the default time. 


(a) Give the cash flow chart. What is the P&L of the protection seller A if the 
reference entity does not default? What is the P&L of the protection buyer B if 
the reference entity defaults in one year and two months? 


(b) What is the relationship between $s, R and A? What is the implied one-year 
default probability at the inception date? 


(c) Seven months later, the CDS spread has increased and is equal to 450 bps. 
Estimate the new default probability. The protection buyer B decides to realize 
his P&L. For that, he reassigns the CDS contract to the counterparty C. Explain 
the offsetting mechanism if the risky PVO1 is equal to 1.189. 


3. We consider the following CDS spread curves for three reference entities: 


Maturity #1 #2 #3 
6M 130 bps 1280 bps 30 bps 
1Y 135 bps 970 bps 35 bps 
3Y 140 bps 750 bps 50 bps 
5Y 150 bps 600 bps 80 bps 


(a) Define the notion of credit curve. Comment the previous spread curves. 


(b) Using the Merton Model, we estimate that the one-year default probability is 
equal to 2.5% for #1, 5% for #2 and 2% for #3 at a five-year time horizon. 
Which arbitrage position could we consider about the reference entity #2? 


4. We consider a basket of n single-name CDS. 


(a) What is a first-to-default (FtD), a second-to-default (StD) and a last-to-default 
(LtD)? 

(b) Define the notion of default correlation. What is its impact on the three previous 
spreads? 


(c) We assume that n = 3. Show the following relationship: 


CDS CDS A 


si 4 s$ CDS _ gFtD at g5tD i gitD 


53 


where sOPS is the CDS spread of the it" reference entity. 


(d) Many professionals and academics believe that the subprime crisis is due to the 
use of the Normal copula. Using the results of the previous question, what could 
you conclude? 
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3.4.2 Risk contribution in the Basel II model 
1. We note L the portfolio loss of n credit and w; the exposure at default of the it” 


credit. We have: 7 
L(w) =wle= SS Wi Ei (3.61) 
i=1 


where g; is the unit loss of the it? credit. Let F be the cumulative distribution function 
of L (w). 


(a) We assume that ¢ = (£€1,...,E€n) ~ N (0, £). Compute the value-at-risk VaR a (w) 
of the portfolio when the confidence level is equal to a. 


(b) Deduce the marginal value-at-risk of the it” credit. Define then the risk contri- 
bution RC; of the it” credit. 


(c) Check that the marginal value-at-risk is equal to: 


O VaRa (w) 
ð Wi 


=E [e; | L (w) =F (a)] 
Comment on this result. 


2. We consider the Basel II model of credit risk and the value-at-risk risk measure. The 
expression of the portfolio loss is given by: 


L= X_ EAD; LGD; -1 {r; < Ti} (3.62) 


i=1 
(a) Define the different parameters EAD;, LGD;, 7; and T;. Show that Model (3.62) 
can be written as Model (3.61) by identifying w; and ¢;. 


(b) What are the necessary assumptions to obtain this result: 


i [e; | L = F7! (a)| =E[LGD,]-E[D; | L =F? (a)] 


(c) Deduce the risk contribution RC; of the itè credit and the value-at-risk of the 
credit portfolio. 


(d) We assume that the credit i defaults before the maturity T; if a latent variable 
Zi goes below a barrier B;: 


Ti <T; e Z< Bi 


We consider that Z; = yP : X + v1 — p- £i where Zi, X and €; are three inde- 
pendent Gaussian variables M (0,1). X is the factor (or the systematic risk) and 
£;i is the idiosyncratic risk. 


i. Interpret the parameter p. 
ii. Calculate the unconditional default probability: 


pi = Pr{t < Ti} 
iii. Calculate the conditional default probability: 


pi (a) = Pr{t <T; |X =a} 
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(e) Show that, under the previous assumptions H, the risk contribution RC; of the 
it? credit is: 


RC; = EAD; E [LGD] - ® (= (pi) + ype"? o) 


VET (3.63) 


when the risk measure is the value-at-risk. 


3. We now assume that the risk measure is the expected shortfall: 


ES, (w) = E[L | L > VaRa (w)] 


(a) In the case of the Basel II framework, show that we have: 


ES, (w) = ŞT EAD; -E [LGD;] -E [p; (X)| X <87 (1-a)] 


(b) By using the following result: 


7 a —b 

®(a+ bx) d(x) dx = ®2 | c, ; 

5 SOARES NENA al VIF TF) 

where ®2 (x, y; p) is the cdf of the bivariate Gaussian distribution with correlation 
p on the space [—o00, 2] -[—00, y], deduce that the risk contribution RC; of the it? 
credit in the Basel II model is: 


C (1 — a, Pi; /P) 


RC; = EAD, E [LGD;] - ; 
=a 


(3.64) 


where C (u1, u2; 0) is the Normal copula with parameter 0. 
(c) What do the results (3.63) and (3.64) become if the correlation p is equal to 
zero? Same question if p = 1. 


4. The risk contributions (3.63) and (3.64) were obtained by considering the assumptions 
H and the default model defined in Question 2(d). What are the implications in terms 
of Pillar 2? 


3.4.3 Calibration of the piecewise exponential model 


1. We denote by F and S the distribution and survival functions of the default time 7. 
Define the function S (t) and deduce the expression of the associated density function 


f(t). 


2. Define the hazard rate A (t). Deduce that the exponential model corresponds to the 
particular case À (t) = À. 


3. We assume that the interest rate r is constant. In a continuous-time model, we recall 
that the CDS spread is given by the following expression: 


(1-R)- fo e-™f (t) at 
Jo e="tS (t) dt 


s(T)= (3.65) 


where R is the recovery rate and T is the maturity of the CDS. Find the triangle 
relationship when T ~ £ (A). 
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4. Let us assume that: 


Ay ift<3 
k= aa ee Se 


(a) Give the expression of the survival function S (t) and calculate the density func- 
tion f (t). Verify that the hazard rate A (t) is a piecewise constant function. 


(b) Find the expression of the CDS spread using Equation (3.65). 


(c) We consider three credit default swaps, whose maturities are respectively equal 
to 3, 5 and 7 years. Show that the calibration of the piecewise exponential model 
implies to solve a set of 3 equations with the unknown variables \,, Ag and A3. 
What is the name of this calibration method? 


(d) Find an approximated solution when r is equal to zero and Am is small. Comment 
on this result. 


(e) We consider the following numerical application: r = 5%, s (3) = 100 bps, s (5) = 
150 bps, s (7) = 160 bps and R = 40%. Estimate the implied hazard function. 


(f) Using the previous numerical results, simulate the default time with the uniform 
random numbers 0.96, 0.23, 0.90 and 0.80. 


3.4.4 Modeling loss given default 


1. What is the difference between the recovery rate and the loss given default? 


2. We consider a bank that grants 250000 credits per year. The average amount of a 
credit is equal to $50000. We estimate that the average default probability is equal to 
1% and the average recovery rate is equal to 65%. The total annual cost of the litigation 
department is equal to $12.5 mn. Give an estimation of the loss given default? 


3. The probability density function of the beta probability distribution B (a, 8) is: 


where B (a, 8) = fi 


(a) Why is the beta probability distribution a good candidate to model the loss given 
default? Which parameter pair (a, 8) does correspond to the uniform probability 


distribution? 

(b) Let us consider a sample (£1,..., £n) of n losses in case of default. Write the log- 
likelihood function. Deduce the first-order conditions of the maximum likelihood 
estimator. 


(c) We recall that the first two moments of the beta probability distribution are: 


Qa 
a+ 


[X] = 


aß 
(a+ 8)? (a+ 6+1) 


o? (X) = 


Find the method of moments estimator. 


4. 
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We consider a risk class C corresponding to a customer/product segmentation specific 
to retail banking. A statistical analysis of 1000 loss data available for this risk class 
gives the following results: 


75% 
100 


100% 
100 


25% 
100 


50% 
600 


LGD; 0% 
Nk 100 


where nx is the number of observations corresponding to LGD ,. 


(a) We consider a portfolio of 100 homogeneous credits, which belong to the risk class 
C. The notional is $10 000 whereas the annual default probability is equal to 1%. 
Calculate the expected loss of this credit portfolio with a one-year time horizon 
if we use the previous empirical distribution to model the LGD parameter. 


We assume that the LGD parameter follows a beta distribution B (a, 3). Cali- 
brate the parameters a and 8 with the method of moments. 


We assume that the Basel II model is valid. We consider the portfolio described 
in Question 4(a) and calculate the unexpected loss. What is the impact if we 
use a uniform probability distribution instead of the calibrated beta probability 
distribution? Why does this result hold even if we consider different factors to 
model the default time? 


— 
g) 
A 


3.4.5 Modeling default times with a Markov chain 


We consider a rating system with 4 risk classes (A, B, C and D), where rating D 
represents the default. The transition probability matrix with a two-year time horizon is 
equal to: 


and: 


94% 3% 2% 1% 

P(2)= 10% 80% 5% 5% 

E 10% 10% 60% 20% 

0% 0% 0% 100% 

We also have: 

88.860% 5.420% 3.230% 2.490% 
P(4)= 17.900% 64.800% 7.200% 10.100% 
= 16.400% 14.300% 36.700% 32.600% 
0.000% 0.000% 0.000% 100.000% 
84.393% 7.325% 3.986% 4.296% 
P(6) = 24.026% 53.097% 7.918% 14.959% 
a 20.516% 15.602% 23.063% 40.819% 
0.000% 0.000% 0.000% 100.000% 


Let us denote by S4 (t), Sg (t) and Sc (t) the survival functions of each risk class A, B and 


C. 


1. How are the matrices P (4) and P (6) calculated? 


2. Assuming a piecewise exponential model, calibrate the hazard function of each risk 
class for 0 <t<2,2<t<4and4<t<6. 
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3. Give the definition of a Markovian generator. How can we estimate the generator A 
associated to the transition probability matrices? Verify numerically that the direct 


estimator is equal to: 


—3.254 1.652 

Â= 5.578 —11.488 
6.215 7.108 

0.000 0.000 


1.264 0.337 
3.533 2.377 z 

-25.916 12593 | * 1? 
0.000 0.000 


4. In Figure 3.59, we show the hazard function A(t) deduced from Questions 2 and 3. 
Explain how do we calculate A(t) in both cases. Why do we obtain an increasing 
curve for rating A, a decreasing curve for rating C and an inverted U-shaped curve 


for rating B? 


Rating A 


Rating B 


4 5 6 7 8 9 10 


t (in years) 


=== Piecewise constant function 
= Markov generator 


t (in years) 


FIGURE 3.59: Hazard function A(t) (in bps) estimated respectively with the piecewise 


exponential model and the Markov generator 


3.4.6 Continuous-time modeling of default risk 


We consider a credit rating system with four risk classes (A, B, C and D), where rating 
D represents the default. The one-year transition probability matrix is equal to: 


94% 
10% 
5% 
0% 


P=P(l)= 


3% 2% 1% 
80% 7% 3% 
15% 60% 20% 

0% 0% 0% 


We denote by S4 (t), Sp (t) and Sc (t) the survival functions of each risk class A, B and 


C. 
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1. Explain how we can calculate the n-year transition probability matrix P (n)? Find 
the transition probability matrix P (10). 


2. Let V = (vi ? Və | V3 : V4 ) and D = diag (A1, A2, Az, Aa) be the matrices of eigenvectors 
and eigenvalues associated to P. 
(a) Show that: 
P(n) V = VD” 
Deduce a second approach for calculating the n-year transition probability matrix 
P(n). 
(b) Calculate the eigendecomposition of the transition probability matrix P. Deduce 


the transition probability matrix P (10). 


3. We assume that the default time follows a piecewise exponential model. Let S; (n) 
and à; (n) be the survival function and the hazard rate of a firm whose initial rating 
is the state i (A, B or C). Give the expression of S; (n) and A; (n). Show that: 


ài (1) = —In (1 — el P”e4) 
Calculate S; (n) and A; (n) for n € {0,...,10, 50, 100}. 


4. Give the definition of a Markov generator. How can we estimate the generator A 
associated to the transition probability matrices? Give an estimate A. 


5. Explain how we can calculate the transition probability matrix P (t) for the time 
horizon t > 0. Give the theoretical approximation of P (t) based on Taylor expansion. 
Calculate the 6-month transition probability matrix. 


6. Deduce the expression of S; (t) and A; (t). 


3.4.7 Derivation of the original Basel granularity adjustment 


In this exercise, we derive the formula of the granularity adjustment that was proposed 
by the Basel Committee in 2001. The mathematical proof follows Chapter 8 (§422 to §457) 
of BCBS (2001a) and the works of Wilde (2001a,b) and Gordy (2003, 2004). We encourage 
the reader to consult carefully these references. Most of the time, we use the notations of 
the Basel Committee!!®. We consider the Basel model that has been presented in Section 
3.2.3.2 on page 169. 


1. We consider the normalized loss: 
L; = LGD; -Di 


We assume that the conditional probability of default is given by the CreditRisk+ 
model (Gordy, 2000): 
pi (X) = pi (1 + w; (X — 1)) 


where w; € (0, 1] is the factor weight and X is the systematic risk factor, which follows 
the gamma distribution G (ag, 8g). Calculate the conditional expected loss!”?: 


(a) = E [L; | X =a] 


119When they are different, we indicate the changes in footnotes. 
120We use the notation E; = E [LGD;]. 


254 


Handbook of Financial Risk Management 
and the conditional variance: 
v(x) = 07 (L; |X =2) 
The Basel Committee assumes that (BCBS, 2001a, §447): 
o (LGD;) = ; FE (1 — Ei) 
Deduce that we have the following approximation: 


noen G + iF) nee) 


. Calculate the granularity adjustment function: 


B (a) = 


mas (“ancy ) 


. In order to maintain the coherency with the IRB formula, the Basel Committee im- 


poses that the conditional probabilities are the same for the IRB formula (Vasicek 
model) and the granularity formula (CreditRisk+ model). Show that: 


1 A; 
(x — 1) pi 


Wi = 


where: 


Deduce the expression of 8 (x). 


. The calibration has been done by assuming that E [|X] = 1 and ø (X) = 2 (BCBS, 


2001a, §445). Show that: 
B (£a) = (0.4 + 1.2- Ei) (6229610 + 1.0747964 - 2) 
We recall that the Basel Committee finds the following expression of 8 (xq): 
b (ta) = (0.4 + 1.2- Ej) (0.70 +1.10- 2) 


How to obtain exactly this formula? 


. In order to transform the granularity adjustment function £ (£a) into risk-weighted 


assets, the Basel Committee indicates that it uses a scaling factor c = 1.5 (BCBS, 
2001a, §457). Moreover, the Basel Committee explains that the “the baseline IRB risk- 
weights for non-retail assets (i.e. the RWA before granularity adjustment) incorporate 
a margin of 4% to cover average granularity”. Let w* be the equivalent homogenous 
portfolio of the current portfolio w. Show that the granularity adjustment is equal 


totti: 
_ EAD* 


n* 


GA 


- GSF —0.04 - RWAnr 


121 The Basel Committee uses the notation Daq instead of O* for the equivalent homogeneous portfolio. 
The global exposure EAD* corresponds to the variable TNRE (total non-retail exposure) of the Basel 
Committee. 
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where RWAnp are the risk-weighted assets for non-retail assets and: 


GSF = (0.6 + 1.8- E*) ( 


p* 
5413.75. + 
9.5 + 13.75 z) 


6. The Basel Committee considers the following definition of the portfolio loss: 


L= 5 5 EAD; - LGD; -D; 


j=l i€C; 


where C; is the j* class of risk. Find the equivalent homogeneous portfolio w* of size 
n* and exposure EAD*. Calibrate the parameters p*, E* and o (LGD*). 


7. Using the notations of BCBS (2001la), summarize the different steps for computing 
the original Basel granularity adjustment. 


3.4.8 Variance of the conditional portfolio loss 


The portfolio loss is given by: 


i=1 


where w; is the exposure at default of the it! credit, LGD; is the loss given default, T; is the 
residual maturity and D; = 1 {r7; < T;} is the default indicator function. We suppose the 
assumptions of the Basel II model are satisfied. We note D; (X) and p; (X) the conditional 
default indicator function and the conditional default probability with respect to the risk 


factor X. 


1. Define D; (X). Calculate 


:[D? (X)] and 


2. Define the conditional portfolio loss L (X). 


3. Calculate the expectation of L (X). 


4. Show that the variance of L (X) is equal to: 


? (E[Di (X)] o? (LGD;) + 


[Dj (X) D; (X)]. 


2? [LGD,] o? (D: (X))) 


Taylor & Francis 
Taylor & Francis Group 


http://taylorandfrancis.com 


Chapter 4 


Counterparty Credit Risk and Collateral 
Risk 


Counterparty credit risk and collateral risk are other forms of credit risk, where the un- 
derlying credit risk is not directly generated by the economic objective of the financial 
transaction. Therefore, it can reduce the P&L of the portfolio and create a loss even if the 
business objective is reached. A typical example is the purchase transaction of a credit de- 
fault swap. In this case, we have previously seen that the protection buyer is hedged against 
the credit risk if the reference entity defaults. This is partially true, because the protection 
buyer faces the risk that the protection seller also defaults. In this example, we see that the 
total P&L of the financial transaction is the direct P&L of the economic objective minus the 
potential loss due to the transaction settlement. Another example concerns the collateral 
risk, since the P&L of the financial transaction is directly affected by the mark-to-market 
of the collateral portfolio. 


In this chapter, we study the counterparty credit risk (CCR) and show its computation. 
We also focus on the regulatory framework that has evolved considerably since the collapse of 
the LTCM hedge fund in 1997, which has shocked the entire financial system, not because of 
the investor losses, but because of the indirect losses generated by the counterparty credit 
risk’. The second section is dedicated to the credit valuation adjustment (CVA), which 
can be considered as the ‘little brother’ of the CCR. This risk has been mainly identified 
with the bankruptcy of Lehman Brothers, which has highlighted the market risk of CCR. 
Finally, Section three reviews different topics associated to the collateral risk management, 
particularly in the repo markets. 


4.1 Counterparty credit risk 


We generally make the distinction between credit risk (CR) and counterparty credit risk 
(CCR). The counterparty credit risk on market transactions is the risk that the counterparty 
could default before the final settlement of the transaction’s cash flows. For instance, if the 
bank buys a CDS protection on a firm and the seller of the CDS protection defaults before 
the maturity of the contract, the bank could not be hedged against the default of the firm. 
Another example of CCR is the delivery/settlement risk. Indeed, few financial transactions 
are settled on the same-day basis and the difference between the payment date and the 
delivery date is generally between one and five business days. There is then a counterparty 
credit risk if one counterparty defaults when the payment date is not synchronized with the 
delivery date. This settlement risk is low when it is expressed as a percent of the notional 
amount because the maturity mismatch is short, but it concerns large amounts from an 
aggregate point of view. In a similar way, when an OTC contract has a positive mark-to- 


1Chapter 8 on page 453 describes the impact of the LTCM bankruptcy on systemic risk. 


257 


258 Handbook of Financial Risk Management 


market, the bank suffers a loss if the counterparty defaults. To reduce this risk, the bank 
can put in place bilateral netting agreements. We note that this risk disappears (or more 
precisely decreases) when the bank uses an exchange, because the counterparty credit risk is 
transferred to the central counterparty clearing house, which guarantees the expected cash 
flows. 


4.1.1 Definition 


BCBS (2004a) measures the counterparty credit risk by the replacement cost of the OTC 
derivative. Let us consider two banks A and B that have entered into an OTC contract €. 
We assume that the bank B defaults before the maturity of the contract. According to 
Pykhtin and Zhu (2006), Bank A can then face two situations: 


e The current value of the contract € is negative. In this case, Bank A closes out the 
position and pays the market value of the contract to Bank B. To replace the contract 
€, Bank A can enter with another counterparty C into a similar contract €’. For that, 
Bank A receives the market value of the contract €’ and the loss of the bank is equal 
to zero. 


e The current value of the contract € is positive. In this case, Bank A close out the 
position, but receives nothing from Bank B. To replace the contract, Bank A can 
enter with another counterparty C into a similar contract €. For that, Bank A pays 
the market value of the contract € to C. In this case, the loss of the bank is exactly 
equal to the market value. 


We note that the counterparty exposure is then the maximum of the market value and 
zero. Moreover, the counterparty credit risk differs from the credit risk by two main aspects 
(Canabarro and Duffie, 2003): 


1. The counterparty credit risk is bilateral, meaning that both counterparties may face 
losses. In the previous example, Bank B is also exposed to the risk that Bank A 
defaults. 


2. The exposure at default is uncertain, because we don’t know what will be the replace- 
ment cost of the contract when the counterparty defaults. 


Using the notations introduced in the previous chapter, we deduce that the credit loss of 
an OTC portfolio is: 


L= 5 EAD; (rj) - LGD; -1 {7 < Ti} 
i=1 

This is the formula of a credit portfolio loss, except that the exposure at default is random 
and depends on different factors: the default time of the counterparty, the evolution of 
market risk factors and the correlation between the market value of the OTC contract and 
the default of the counterparty. 

Let MtM (t) be the mark-to-market value of the OTC contract at time t. The exposure 
at default is defined as: 

EAD = max (MtM (7), 0) 


If we consider a portfolio of OTC derivatives with the same counterparty entity, the exposure 
at default is the sum of positive market values: 


EAD = 5 max (MtM; (T) ,0) 


i=l 
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This is why the bank may be interested in putting in place a global netting agreement: 


EAD = max (>: MtM; (T) o) 


i=l 


< 5 max (MtM; (T), 0) 


i=1 


In practice, it is extremely complicated and rare that two counterparties succeed in signing 
such agreement. Most of the time, there are several netting agreements on different trading 
perimeters (equities, bonds, interest rate swaps, etc.). In this case, the exposure at default 
is: 


EAD = 5 max B MtM; (T) o) + 5 max (MtM; (T), 0) 
k iENk igUNk 


where M, corresponds to the kt? netting arrangement and defines a netting set. Since 
the default of Lehman Brothers, we observe a strong development of (global and partial) 
netting agreements in order to reduce potential losses, but also the capital charge induced 
by counterparty credit risk. 


Example 43 Banks A and B have traded five OTC products, whose mark-to-market values” 
are given in the table below: 


t 1 2 3 4 5 6 7 8 
€ 5 5 3 0 -4 0 5 8 
f —5 10 5 -3 -2 -8 -T 10 
C3 0 2 -3 -4 -6 -3 0 5 
C4 2 -5 -5 -5 2 3 5 7 
C5 1 3 -4 -5 -7 -6 -7 6 


If we suppose that there is no netting agreement, the counterparty exposure of Bank 
A corresponds to the second row in Table 4.1. We notice that the exposure changes over 
time. If there is a netting agreement, we obtain lower exposures. We now consider a more 
complicated situation. We assume that Banks A and B have two netting agreements: one 
on equity OTC contracts (€; and €2) and one on fixed income OTC contracts (€3 and €4). 
In this case, we obtain results given in the last row in Table 4.1. For instance, the exposure 
at default for t = 8 is calculated as follows: 


EAD = max (8 — 10,0) + max (5 + 7,0) + max (—6,0) = 12 


TABLE 4.1: Counterparty exposure of Bank A 


t 1 23 45 6 7 8 

No netting 7 17 8 0 2 3 10 20 

Global netting 1 9 000 0 0 4 
Partial netting 2 15 8 0 0 0 


If we consider Bank B, the counterparty exposure is given in Table 4.2. This illustrates the 
bilateral nature of the counterparty credit risk. Indeed, except if there is a global netting 
arrangement, both banks have a positive counterparty exposure. 


?They are calculated from the viewpoint of Bank A. 
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TABLE 4.2: Counterparty exposure of Bank B 


t 12 3 4 5 6 7 8 
No netting 6 8 12 17 19 17 14 16 
Global netting 0 0 4 17 17 14 4 0 
Partial netting 1 6 12 17 17 14 9 8 


Remark 49 In the previous example, we have assumed that the mark-to-market value of 
the OTC contract for one bank is exactly the opposite of the mark-to-market value for the 
other bank. In practice, banks calculate mark-to-model prices, implying that they can differ 
from one bank to another one. 


4.1.2 Modeling the exposure at default 


In order to understand the counterparty credit risk, we begin by an example and illus- 
trate the time-varying property of the exposure at default. Then, we introduce the different 
statistical measures that are useful for characterizing the EAD and show how to calculate 
them. 


4.1.2.1 An illustrative example 


Example 44 We consider a bank that buys 1000 ATM call options, whose maturity is one- 
year. The current value of the underlying asset is equal to $100. We assume that the interest 
rate r and the cost-of-carry parameter b are equal to 5%. Moreover, the implied volatility of 
the option is considered as a constant and is equal to 20%. 


By considering the previous parameters, the value Cy of the call option? is equal to 
$10.45. At time t, the mark-to-market of this derivative exposure is defined by: 


MtM (t) = ne : (C (t) — Co) 


where nc and C (t) are the number and the value of call options. Let e (t) be the exposure 
at default. We have: 
e(t) = max (MtM (t) ,0) 


At the initial date of the trade, the mark-to-market value and the counterparty exposure 
are zero. When t > 0, the mark-to-market value is not equal to zero, implying that the 
counterparty exposure e(t) may be positive. In Table 4.3, we have reported the values 
taken by C(t), MtM (t) and e(t) for two scenarios of the underlying price S (t). If we 
consider the first scenario, the counterparty exposure is equal to zero during the first three 
months, because the mark-to-market value is negative. The counterparty exposure is then 
positive for the next four months. For instance, it is equal to $2519 at the end of the fourth 
month’. In the case of the second scenario, the counterparty exposure is always equal to zero 
except for two months. Therefore, we notice that the counterparty exposure is time-varying 
and depends of the trajectory of the underlying price. This implies that the counterparty 
exposure cannot be calculated once and for all at the initial date of the trade. Indeed, the 
counterparty exposure changes with time. Moreover, we don’t known what the future price 
of the underlying asset will be. That’s why we are going to simulate it. 


3We use the Black-Scholes formula given by Equation (2.10) on page 94 to price the option. 
4We have: 
MtM (t) = 1000 x (12.969 — 10.450) = $2519 
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TABLE 4.3: Mark-to-market and counterparty exposure of the call option 


Scenario #1 Scenario #2 
S(t) C(t) MtM(t) e(t) S(t) C(t) MtM(t) e(t) 
IM 97.58 844  —2013 0 91.63 5.36 —5092 0 
2M 98.19 8.25 —2199 0 89.17 3.89 —6564 0 
3M 95.59 6.26 —4188 0 97.60 7.35 —38099 0 
0 
0 
0 


4M 106.97 12.97 2519 2519 97.59 6.77 —3683 

5M 104.95 10.83 382 382 96.29 5.48 —4970 

6M 110.73 14.68 4232 4232 97.14 5.29 —5157 

7M 113.20 16.15 5700 5700 107.71 11.55 1098 1098 

8M 102.04 669  —3761 0 105.71 9.27 —1182 0 

9M 115.76 17.25 6802 6802 107.87 10.18 —272 0 

10M 103.58 5.96 —4487 0 108.40 9.82 —630 0 

11M 104.28 541 —5043 0 104.68 5.73 —4720 0 
1Y 104.80 4.80 —5646 0 115.46 15.46 5013 5013 


We note MtM (¢1; t2) the mark-to-market value between dates tı and t2. By construction, 
we have: 
MtM (0; t) = MtM (0; to) + MtM (to; t) 


where 0 is the initial date of the trade, to is the current date and t is the future date. This 
implies that the mark-to-market value at time t has two components: 


1. the current mark-to-market value MtM (0; to) that depends on the past trajectory of 
the underlying price; 


2. and the future mark-to-market value MtM (to; t) that depends on the future trajectory 
of the underlying price. 


In order to evaluate the second component, we need to define the probability distribution of 
S (t). In our example, we can assume that the underlying price follows a geometric Brownian 
motion: 


dS (t) = uS (t) dt + oS (t) dW (t) 


We face here an issue because we have to define the parameters u and ø. There are two 
approaches: 


1. the first method uses the historical probability measure P, meaning that the parame- 
ters u and o are estimated using historical data; 


2. the second method considers the risk-neutral probability measure Q, which is used to 
price the OTC derivative. 


While the first approach is more relevant to calculate the counterparty exposure, the second 
approach is more frequent because it is easier for a bank to implement it. Indeed, Q is already 
available because of the hedging portfolio, which is not the case of P. In our example, this is 
equivalent to set u and o to their historical estimates ĝ and ô if we consider the historical 
probability measure P, while they are equal to the interest rate r and the implied volatility 
X if we consider the risk-neural probability measure Q. 

In Figure 4.1, we report an illustration of scenario generation when the current date to is 
6 months. This means that the trajectory of the asset price S (t) is given when t < tọ whereas 
it is simulated when t > to. At time tp = 0.5, the asset price is equal to $114.77. We deduce 
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Simulated scenarios 
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FIGURE 4.1: Probability density function of the counterparty exposure after six months 


that the option price C (to) is equal to $18.17. The mark-to-market value is then positive 
and equal to $7716. Using 10000 simulated scenarios, we estimate the probability density 
function of the mark-to-market value MtM (0; 1) at the maturity date (bottom/left panel in 
Figure 4.1) and deduce the probability density function of the counterparty exposure e (1) 
(bottom/right panel in Figure 4.1). We notice that the probability to obtain a negative 
mark-to-market at the maturity date is significant. Indeed, it is equal to 36% because it 
remains 6 months and the asset price may sufficiently decrease. Of course, this probability 
depends on the parameters used for simulating the trajectories, especially the trend u. Using 
a risk-neutral approach has the advantage to limit the impact of this coefficient. 


Remark 50 The mark-to-market value presents a very high skew, because it is bounded. 
Indeed, the worst-case scenario is reached when the asset price S (1) is lower than the strike 
K = 100. In this case, we obtain: 


MtM(0;1) = 1000 x (0 — 10.45) 
—$10450 


We suppose now that the current date is nine months. During the last three months, the 
asset price has changed and it is now equal to $129.49. The current counterparty exposure 
has then increased and is equal to” $20 294. In Figure 4.2, we observe that the shape of the 
probability density function has changed. Indeed, the skew has been highly reduced, because 
it only remains three months before the maturity date. The price is then sufficiently high 
that the probability to obtain a positive mark-to-market at the settlement date is almost 
equal to 100%. This is why the two probability density functions are very similar. 


We can use the previous approach of scenario generation in order to represent the evo- 
lution of counterparty exposure. In Figure 4.3, we consider two observed trajectories of the 


5Using the previous parameters, the BS price of the call option is now equal to $30.74. 
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Simulated scenarios 
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FIGURE 4.2: Probability density function of the counterparty exposure after nine months 
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FIGURE 4.3: Evolution of the counterparty exposure 
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asset price. For each trajectory, we report the current exposure, the expected exposure and 
the 95% quantile of the counterparty exposure at the maturity date. All these counter- 
party measures converge at the maturity date, but differ before because of the uncertainty 
between the current date and the maturity date. 


4.1.2.2 Measuring the counterparty exposure 


We define the counterparty exposure at time t as the random credit exposure®: 


e (t) = max (MtM (0;t) ,0) (4.1) 


This counterparty exposure is also known as the potential future exposure (PFE). When 
the current date tọ is not equal to the initial date 0, the counterparty exposure can be 
decomposed in two parts: 
e(t) = max(MtM (0; to) + MtM (to; ¢) ,0) 
= max(MtM (0; to) ,0) + 
(max (MtM (0; to) + MtM (to; t) ,0) — max (MtM (0; to) ,0)) 


The first component is the current exposure, which is always positive: 
CE (to) = max (MtM (0; to) , 0) 


The second component is the credit variation between to and t. While the current mark- 
to-market value is negative, the second component can only be a positive value. However, 
the credit variation may also be negative if the future mark-to-market value is negative. 
Let us denote by Fjo, the cumulative distribution function of the potential future exposure 
e (t). The peak exposure (PE) is the quantile of the counterparty exposure at the confidence 
level a: 


PEa (t) = Foy (a) 
= {infx:Pr{e(t)<a}>a} (4.2) 
The maximum value of the peak exposure is referred as the maximum peak exposure’ 
(MPE): 
MPEs. (0; t) = sup PE, (0; s) (4.3) 
We now introduce the traditional counterparty credit risk measures: 


e The expected exposure (EE) is the average of the distribution of the counterparty 
exposure at the future date t: 


EE(t) = Ele(t)] 
f x dF 0,4 (x) (4.4) 


e The expected positive exposure (EPE) is the weighted average over time (0, t] of the 
expected exposure: 


EPE(0;t) = [Ef ets) as 


1 rt 
= ai EE (s) ds (4.5) 


t 


6The definitions introduced in this paragraph come from Canabarro and Duffie (2003) and the Basel II 
framework. 
"It is also known as the maximum potential future exposure (MPFE). 
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e The effective expected exposure (EEE) is the maximum expected exposure that occurs 
at the future date t or any prior date: 


EEE (t) = sup EE(s) 
s<t 


= max (EEE (t`) ,EE (t)) (4.6) 


e Finally, the effective expected positive exposure (EEPE) is the weighted average over 
time [0, t] of the effective expected exposure: 


EEPE (0; t) = : [ EEE (s) ds (4.7) 
0 


We can make several observations concerning the previous measures. Some of them are 
defined with respect to a future date t. This is the case of PE, (t), EE (t) and EEE (t). The 
others depend on the time period [0; t], typically a one-year time horizon. Previously, we 
have considered the counterparty measure e (t), which defines the potential future exposure 
between the initial date 0 and the future date t. We can also use other credit measures like 
the potential future exposure between the current date to and the future date t: 


e(t) = max (MtM (to; t) , 0) 


The counterparty exposure e (t) can be defined with respect to one contract or to a basket 
of contracts. In this last case, we have to take into account netting arrangements. 


4.1.2.3 Practical implementation for calculating counterparty exposure 


We consider again Example 44 and assume that the current date to is the initial date 
t = 0. Using 50000 simulations, we have calculated the different credit measures with 
respect to the time t and reported them in Figure 4.4. For that, we have used the risk- 
neutral distribution probability Q in order to simulate the trajectory of the asset price 
S(t). Let {to, t1,...,tn} be the set of discrete times. We note ng the number of simulations 
and S; (t;) the value of the asset price at time t; for the j*" simulation. For each simulated 
trajectory, we then calculate the option price C; (t;) and the mark-to-market value: 


MtM; (ti) = nc - (C; (ti) — Co) 
Therefore, we deduce the potential future exposure: 
ej (ti) = max (MtM, (t;) ,0) 
The peak exposure at time t; is estimated using the order statistics: 
PE (ti) = Cangins (ti) (4.8) 


We use the empirical mean to calculate the expected exposure: 
EE (t;) = — > e; (ti) (4.9) 
For the expected positive exposure, we approximate the integral by the following sum: 


EPE (0; t;) = a XU EE (tx) Aty (4.10) 
i k=1 
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If we consider a fixed-interval scheme with At, = At, we obtain: 


EPE (0;t;) = a SEB (te) 


= 1 EE (th) (4.11) 


By definition, the effective expected exposure is given by the following recursive formula: 
EEE (¢;) = max (EEE (t;_1) , EE (¢;)) (4.12) 


where EEE (0) is initialized with the value EE (0). Finally, the effective expected positive 
exposure is given by: 


EEPE (0; t;) LS BEE (tk) Atg (4.13) 
? k=1 
In the case of a fixed-interval scheme, this formula becomes: 


EEPE (0; t;) -IY Ee (ty) (4.14) 


If we consider Figure 4.4, we observe that the counterparty exposure is increasing with 
respect to the time horizon’. This property is due to the fact that the credit risk evolves 
according to a square-root-of-time rule vt. In the case of an interest rate swap, the counter- 
party exposure takes the form of a bell-shaped curve. In fact, there are two opposite effects 
that determine the counterparty exposure (Pykhtin and Zhu, 2007): 


e the diffusion effect of risk factors increases the counterparty exposure over time, be- 
cause the uncertainty is greater in the future and may produce very large potential 
future exposures compared to the current exposure; 


e the amortization effect decreases the counterparty exposure over time, because it 
reduces the remaining cash flows that are exposed to default. 


In Figure 4.5, we have reported counterparty exposure in the case of an interest swap with 
a continuous amortization. The peak exposure initially increases because of the diffusion 
effect and generally reaches its maximum at one-third of the remaining maturity. It then 
decreases because of the amortization effect. This is why it is equal to zero at the maturity 
date when the swap is fully amortized. 


4.1.3 Regulatory capital 


The Basel II Accord includes three approaches to calculate the capital requirement 
for the counterparty credit risk: current exposure method (CEM), standardized method 
(SM) and internal model method (IMM). In March 2014, the Basel Committee decided to 
replace non-internal model approaches (CEM and SM) by a more sensitive approach called 
standardized approach (or SA-CCR), which is has been implemented since January 2017. 

Each approach defines how the exposure at default EAD is calculated. The bank uses 
this estimate with the appropriated credit approach (SA or IRB) in order to measure the 
capital requirement. In the SA approach, the capital charge is equal to: 


K = 8%-EAD-RW 


8This implies that MPEa (0; t) = PEa (t) and EEE (t) = EE (t). 
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FIGURE 4.4: Counterparty exposure profile of options 
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FIGURE 4.5: Counterparty exposure profile of interest rate swaps 
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where RW is the risk weight of the counterparty. In the IRB approach, we recall that: 


pee (» (= (PD) + \/p(PD)&-} c) Po) awi 
1 — p (PD) 


where LGD and PD are the loss given default and the probability of default, which apply 
to the counterparty. The correlation p (PD) is calculated using the standard formula (3.35) 
given on page 184. 


4.1.3.1 Internal model method 
In the internal model method, the exposure at default is calculated as the product of a 
scalar a and the one-year effective expected positive exposure?: 


EAD = a- EEPE (0; 1) 


The Basel Committee has set the value a at 1.4. The maturity M used in the IRB formula 
is equal to one year if the remaining maturity is less or equal than one year. Otherwise, it 
is calculated as follows!”: 


M = min (1 q Dakar {tk > 1} EE (te) - Ati -Bo (tx) 5) 


pe 1 {te < 1} - EEE (tk) - Atp - Bo (tk) 


Under some conditions, the bank may uses its own estimates for œ. Let LEE be the loan 
equivalent exposure such that: 


K (LEE -LGD -1 {r < T}) = K (EAD (7) -LGD -1 {r < T}) (4.15) 


The loan equivalent exposure is then the deterministic exposure at default, which gives the 
same capital than the random exposure at default EAD (7). Using a one-factor credit risk 
model, Canabarro et al. (2003) showed that: 


_ LEE 
“= EPE 
This is the formula that banks must use in order to estimate a, subject to a floor of 1.2. 


Example 45 We assume that the one-year effective expected positive exposure with respect 
to a given counterparty is equal to $50.2 mn. 


In Table 4.4, we have reported the required capital X for different values of PD under 
the foundation IRB approach. The maturity M is equal to one year and we consider the 
45% supervisory factor for the loss given default. The exposure at default is calculated 
with a = 1.4. We show the impact of the Basel III multiplier applied to the correlation. In 
this example, if the default probability of the counterparty is equal to 1%, this induces an 
additional required capital of 27.77%. 


°lf the remaining maturity T of the product is less than one year, the exposure at default becomes: 


EAD = a- EEPE (0;7) 


10The maturity has then a cap of five years. 
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TABLE 4.4: Capital charge of counterparty credit risk under the FIRB approach 


PD 1% 2% 3% 4% 5% 
p(PD) (in %) 19.28 1641 14.68 13.62 12.99 
K(in$mn) 412 538 618 682 7.42. 
“>> 5 (PD) (in %) 2410 2052 1835 17:03 16.23 
Basel IT i'tin'§ mn) «5.26 6.69 750. «8.25 EB 


AK (in%) ~ 27.77 2429 ~ 22226 ~ 20°89 ~ 19.88 


Basel IT 


4.1.3.2 Non-internal model methods (Basel IT) 
Under the current exposure method (CEM), we have: 
EAD = CE (0)+ A 


where CE (0) is the current exposure and A is the add-on value. In the views of the Basel 
Committee, CE (0) represents the replacement cost, whereas the add-on reflects the po- 
tential future exposure of the contract. For a single OTC transaction, A is the product of 
the notional and the add-on factor, which is given in Table 4.5. For a portfolio of OTC 
transactions with netting agreements, the exposure at default is the sum of the current net 
exposure plus a net add-one value Ay, which is defined as follows: 


where Ag = },; A; is the gross add-on, A; is the add-on of the it? transaction and NGR is 
the ratio between the current net and gross exposures. 


TABLE 4.5: Regulatory add-on factors for the current exposure method 


Residual Fixed FX and Eouit Precious Other 

Maturity Income Gold MIY Metals Commodities 
0—1Y 0.0% 1.0% 8.0% 7.0% 10.0% 

1Y-5Y 0.5% 5.0% 8.0% 7.0% 12.0% 
5Y+ 1.5% 7.5% 10.0% 8.0% 15.0% 


Example 46 We consider a portfolio of four OTC derivatives, which are traded with the 
same counterparty: 


Contract Cy C5 C3 C4 
Asset class Fixed income Fixed income Equity Equity 
Notional (in $ mn) 100 40 20 10 
Maturity 2Y 6Y 6M 18M 
Mark-to-market (in $ mn) 3.0 —2.0 2.0 —1.0 


We assume that there are two netting arrangements: one concerning fixed income derivatives 
and another one for equity derivatives. 


In the case where there is no netting agreement, we obtain these results: 


Contract & € €3 €, Sum 
CE (0) (in $ mn) 3.0 0.0 2.0 0.0 5.0 
Add-on (in %) 0.5 1.5 8.0 8.0 

A (in $ mn) 0.5 06 1.6 0.8 3.5 
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The exposure at default is then equal to $8.5 mn. If we take into account the two netting 
agreements, the current net exposure becomes: 


CE (0) = max (3 — 2,0) + max (2 — 1,0) = $2 mn 
We deduce that NGR is equal to 2/5 or 40%. It follows that: 
An = (0.4 + 0.6 x 0.4) x 3.5 = $2.24 mn 


Finally, the exposure at default is equal to $4.24 mn. 

The standardized method was designed for banks that do not have the approval to 
apply the internal model method, but would like to have a more sensitive approach that 
the current exposure method. In this framework, the exposure at default is equal to: 


EAD = 8-max | X` CMVi, 5) CCF; -|X RPT; 
i j i€j 

where CMV; is the current market value of transaction i, CCF; is the supervisory credit 
conversion factor with respect to the hedging set j and RPT; is the risk position from 
transaction 7. The supervisory scaling factor 8 is set to 1.4. In this approach, the risk 
positions have to be grouped into hedging sets, which are defined by similar instruments 
(e.g. same commodity, same issuer, same currency, etc.). The risk position }-;e j RPT; is 
the sum of notional values of linear instruments and delta-equivalent notional values of non- 
linear instruments, which belong to the hedging set 7. The credit conversion factors ranges 
from 0.3% to 10%. The initial goal of the Basel Committee was to provide an approach 
which mimics the internal model method!!. However, the SM approach was never really 
used by banks. Indeed, it didn’t interest advanced banks that preferred to implement the 
IMM, and it was too complicated for the other banks that have used the CEM. 


4.1.3.3 SA-CCR method (Basel III) 


The SA-CCR has been adopted by the Basel Committee in March 2014 in order to 
replace non-internal models approaches since January 2017. The main motivation the Basel 
Committee was to propose a more-sensitive approach, which can easily be implemented: 


“Although being more risk-sensitive than the CEM, the SM was also criticized 
for several weaknesses. Like the CEM, it did not differentiate between margined 
and unmargined transactions or sufficiently capture the level of volatilities ob- 
served over stress periods in the last five years. In addition, the definition of 
hedging set led to operational complexity resulting in an inability to implement 
the SM, or implementing it in inconsistent ways” (BCBS, 2014b, page 1). 


The exposure at default under the SA-CCR is defined as follows: 
EAD = a- (RC+ PFE) 


where RC is the replacement cost (or the current exposure), PFE is the potential future 
exposure and a is equal to 1.4. We can view this formula as an approximation of the IMM 
calculation, meaning that RC + PFE represents a stylized EEPE value. The PFE add-on is 
given by: 


5 
PFE =y: X A“) 


q=1 


1lIndeed, the 8 multiplier coefficient is the equivalent of the a multiplier coefficient, whereas the rest of 
the expression can be interpreted as an estimate of the effective expected positive exposure. 
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where y is the multiplier and As) is the add-on of the asset class C, (interest rate, foreign 
exchange, credit, equity and commodity). We have: 


MtM 
1.90 - Yp; ACs) 


y = min | 1,0.05 + 0.95 - exp 


where MtM is the mark-to-market value of the derivative transactions minus the haircut 
value of net collateral held. We notice that y is equal to 1 when the mark-to-market is 
positive and y € [5%,1] when the net mark-to-market is negative. Figure 4.6 shows the 


relationship between the ratio MtM / De Aa) and the multiplier y. The role of y is 
then to reduce the potential future exposure in the case of negative mark-to-market. 
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FIGURE 4.6: Impact of negative mark-to-market on the PFE multiplier 


The general steps for calculating the add-on are the following. First, we have to determine 
the primary risk factors of each transaction in order to classify the transaction into one or 
more asset classes. Second, we calculate an adjusted notional amount d; at the transaction 
level? and a maturity factor MF;, which reflects the time horizon appropriate for this type 
of transactions. For unmargined transactions, we have: 


MF; = ymin (Mj, 1) 


12The trade-level adjusted notional d; is defined as the product of current price of one unit and the 
number of units for equity and commodity derivatives, the notional of the foreign currency leg converted to 
domestic currency for foreign exchange derivatives and the product of the trade notional amount and the 
supervisory duration SD; for interest rate and credit derivatives. The supervisory duration SD; is defined 


as follows: 
SD; = 20- (eres: E e—0-05:E:) 


where S; and E; are the start and end dates of the time period referenced by the derivative instrument. 
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where M; is the remaining maturity of the transaction and is floored by 10 days. For 


margined transactions, we have: 


MF: = 5 MF 


a 


where M}? is the appropriate margin period of risk (MPOR). Then, we apply a supervisory 
delta adjustment A; to each transaction'® and a supervisory factor SF; to each hedging 
set j in order to take the volatility into account. The add-on of one transaction i has then 
the following expression: 
A; = SF; - (A; - di MF;) 

Finally, we apply an aggregation method to calculate the add-on As) of the asset class Cy 
by considering correlations between hedging sets. Here are the formulas that determine the 
add-on values: 


e The add-on for interest rate derivatives is equal to: 


3 3 
Alix) = SSF; A 5 Xo Pk,k! . Dik . Dig 
j 


k=1 k'=1 


where notations j and k refer to currency j and maturity bucket!* k and the effective 
notional Dj, is calculated according to: 


Djr= X Aidi MF; 


iE(j,k) 
e For foreign exchange derivatives, we obtain: 
AM) =X SF; |X Ai di: MF, 
J tej 
where the hedging set j refers to currency pair j. 


e The add-on for credit and equity derivatives use the same formula: 


2 
Alcredit /equity) = (= Pk: a] Bi 5 (1 = pz) r A? 
k k 


where k represents entity k and: 


Ap = SFr: X Ai di» MF; 


iEk 


e In the case of commodity derivatives, we have: 


2 
A = N (o >D Aja) A D o) 
k k 


j 


where j indicates the hedging set, k corresponds to the commodity type and: 


Ajk=SFjk X Aidi: MF; 
iE(j,k) 


13For instance A; is equal to —1 for a short position, +1 for a long position, the Black-Scholes delta for 
an option position, etc. 

l4The three maturity buckets k are (1) less than one year, (2) between one and five years and (3) more 
than five years. 
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TABLE 4.6: Supervisory parameters for the SA-CCR approach 


Asset class SF; Pk X; 
0—1Y 0.50% 100% 50% 
Interest rate 1Y—5Y 0.50% 70% 100% 50% 
5Y+ 0.50% 30% 70% 100% 50% 
-Foreign exchange = = = 400% >= 0 15% 
eee AAA o 03s% 50% o0% 
AA 0.38% 50% 100% 
A 0.42% 50% 100% 
BBB 0.54% 50% 100% 
Credit BB 1.06% 50% 100% 
B 1.60% 50% 100% 
CCC 6.00% 50% 100% 
IG index 0.38% 80% 80% 
SG index 1.06% 80% 80% 
“Equity ~~~ ~~ “Single name 32.00% 50% 120% 
Index 20.00% 80% 75% 
"Electricity ~ 40.00% 40% 150% — 
Oil & gas 18.00% 40% 70% 
Commodity Metals 18.00% 40% 70% 
Agricultural 18.00% 40% 70% 
Other 18.00% 40% 70% 


Source: BCBS (2014b). 


For interest rate derivatives, hedging sets correspond to all derivatives in the same currency 
(e.g. USD, EUR, JPY). For currency, they consists of all currency pairs (e.g. USD/EUR, 
USD/JPY, EUR/JPY). For credit and equity, there is a single hedging set, which contains 
all the entities (both single names and indices). Finally, there are four hedging sets for 
commodity derivatives: energy (electricity, oil & gas), metals, agricultural and other. In 
Table 4.6, we give the supervisory parameters’ for the factor SFj, the correlation'® pp 
and the implied volatility ©; in order to calculate Black-Scholes delta exposures. We notice 
that the value of the supervisory factor can differ within one hedging set. For instance, 
it is equal to 0.38% for investment grade (IG) indices, while it takes the value 1.06% for 
speculative grade (SG) indices. 


Example 47 The netting set consists of four interest rate derivatives" : 


Trade Instrument Currency Maturity Swap Notional MtM 


1 IRS USD IM Payer 4 0.10 
2 IRS USD 4Y Receiver 20 —0.20 
8 IRS USD 10Y Payer 20 0.70 
4 Swaption 10Y USD 1Y Receiver 5 0.50 


This netting set consists of only one hedging set, because the underlying assets of all 
these derivative instruments are USD interest rates. We report the different calculations in 


15Source: BCBS (2014b). 
16 We notice that we consider cross-correlations between the three time buckets for interest rate derivatives. 
17For the swaption, the forward rate swap and the strike value are equal to 6% and 5%. 
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the following table: 


S E SD; hy d4 MF, D; 
0.00 0.75 0.74 1.00 2.94 0.87 2.55 
0.00 4.00 3.63 —1.00 72.51 1.00 —72.51 
0.00 10.00 7.87 1.00 157.39 1.00 157.39 
1.00 11.00 7.49 —0.27 37.43 1.00 —10.08 


A w N Hejs 
w Nye ar 


where k indicates the time bucket, S; is the start date, E; is the end date, SD; is the 
supervisory duration, A; is the delta, d; is the adjusted notional, MF; is the maturity 
factor and D; is the effective notional. For instance, we obtain the following results for the 
swaption transaction: 


SD; = 20x (000% — e70:05x10) = 7.49 
A; = 8 ( ml i E nay vi) = —0.27 
0.5 x V1 2 
di = 7.49 x 5 = 37.43 
MF; = Vi=1 
Di = —0.27 x 37.43 x 1 = —10.08 


We deduce that the effective notional of time buckets is respectively equal to Dı = 2.55, 
Dg = —72.51 and D3 + D4 = 147.30. It follows that: 


3 3 
S> Pre DjkDjw = 2.55? — 2 x 70% x 2.55 x 72.51 + 


72.517 — 2 x 70% x 72.51 x 147.30 + 
147.307 + 2 x 30% x 2.55 x 147.30 
= 119761 


While the supervisory factor is 0.50%, the add-on value A") is then equal to 0.55. The 
replacement cost is: 
RC = max (0.1 — 0.2 + 0.7 + 0.5,0) = 1.1 


Because the mark-to-market of the netting set is positive, the PFE multiplier is equal to 1. 
We finally deduce that: 


EAD = 1.4 x (1.1 + 1 x 0.55) = 2.31 


Remark 51 Annex 4 of BCBS (2014b) contains four examples of SA-CCR calculations 
and presents also several applications including different hedging sets, netting sets and asset 
classes. 


Even if SA-CCR is a better approach for measuring the counterparty credit risk than 
CEM and SM, its conservative calibration has been strongly criticized, in particular the 
value of a. For instance, the International Swaps and Derivatives Association reports many 
examples, where the EAD calculated with SA-CCR is a multiple of the EAD calculated with 
CEM and IMM'®. This is particularly true when the mark-to-market is negative and the 
hedging set is unmargined. In fact, the industry considers that a ~ 1 is more appropriate 
than a = 1.4. 


18 yww.isda.org/a/qTiDE/isda-letter-to-the-bcbs-on-sa-ccr-march-2017. pdf 


Counterparty Credit Risk and Collateral Risk 275 


4.1.4 Impact of wrong way risk 


According to ISDA (2014b), the wrong way risk (WWR) is defined as the risk that 
“occurs when exposure to a counterparty or collateral associated with a transaction is 
adversely correlated with the credit quality of that counterparty”. This means that the 
exposure at default of the OTC contract and the default risk of the counterparty are not 
independent, but positively correlated. Generally, we distinguish two types of wrong way 
risk: 


1. general (or conjectural) wrong way risk occurs when the credit quality of the coun- 
terparty is correlated with macroeconomic factors, which also impact the value of the 
transaction; 


2. specific wrong way risk occurs when the correlation between the exposure at default 
and the probability of default is mainly explained by some idiosyncratic factors. 


For instance, general WWR arises when the level of interest rates both impacts the mark- 
to-market of the transaction and the creditworthiness of the counterparty. An example of 
specific WWR is when Bank A buys a CDS protection on Bank B from Bank C, and the 
default probabilities of B and C are highly correlated. In this case, if the credit quality of B 
deteriorates, both the mark-to-market of the transaction and the default risk of C increase. 


Remark 52 Right way risk (RWR) corresponds to the situation where the counterparty 
exposure and the default risk are negatively correlated. In this case, the mark-to-market of 
the transaction decreases as the counterparty approaches the default. By definition, RWR is 
less a concern from a regulation point of view. 


4.1.4.1 An example 


Let us assume that the mark-to-market of the OTC contract is given by a Brownian 
motion: 
MtM (t) = w+ oW (t) 


If we note e (t) = max (MtM (t) ,0), we have: 


afle] = i max (u+ ovie,0) o (x) da 


—co 


= 2-4) ET oa 


H H 
= (a) =m 
K A ) g ovt 
We consider the Merton approach for modeling the default time T of the counterparty. Let 
B (t) = $7! (1 — S (t)) be the default barrier, where S (t) is the survival function of the 


counterparty. We assume that the dependence between the mark-to-market MtM (t) and 
the survival time is equal to the Normal copula C (u1, ue; p) with parameter p. Redon (2006) 
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shows that!?: 


Ele) |r =] = Ele(t)| B) = B] 
wa (2) u(t 


where ug = u + po VtB and og = \/1— p?o vt. With the exception of p = 0, we have: 


i [e ©] # E [e (t) | 7 = #] 


In Figure 4.7, we report the conditional distribution of the mark-to-market given that the 
default occurs at time t = 1. The parameters are u = 0, 0 = 1 and T ~ E (A) where À is 
calibrated to fit the one-year probability of default PD 2°. We notice that the exposure at 
default decreases with the correlation p when PD is equal to 1% (top/left panel), whereas 
it increases with the correlation p when PD is equal to 99% (top/right panel). We verify 
the stochastic dominance of the mark-to-market with respect to the default probability. 
Figure 4.8 shows the relationship between the conditional expectation E [e (t) | T = t] and 
the different parameters?!. As expected, the exposure at default is an increasing function 
of u, o, p and PD. 


4.1.4.2 Calibration of the a factor 


In the internal model method, the exposure at default is computed by scaling the effective 
expected positive exposure: 
EAD = a: EEPE (0; 1) 


where a is the scaling factor. In this framework, we assume that the mark-to-market of 
the OTC transaction and the default risk of the counterparty are not correlated. Therefore, 
the Basel Committee requires that the calibration of the scaling factor œ incorporates the 
general wrong way risk. According to BCBS (2006), we have7?: 


K (EAD (T) -LGD-1 {r < T} 
K(EPE-LGD-1{r <T} 


19Since we have 1 — S (t) ~ Ujo,ı], it follows that B (t) ~ N (0,1). We deduce that the random vector 
(MtM (t), B (t)) is normally distributed: 


MtM(t) \ N H o7t povt 
B (t) 0 J? \ povt 1 
because the correlation p (MtM (t) , B (£)) is equal to the Normal copula parameter p. Using the conditional 
expectation formula given on page 1062, it follows that: 


MtM (t) | B(t) = B~N (uB, oh) 
where: 
uB = w+ pov't (B- 0) 


and: 
o3 =0°t pot = (1 P) ot 


20We have 1 — e~* = PD. 
21The default values are u = 0, o = 1, PD = 90% and p = 50%. 
?2Using standard assumptions (single factor model, fined-grained portfolio, etc.), the first-order approxi- 


mation is: 
LEE 


ax — 
EPE 
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FIGURE 4.7: Conditional distribution of the mark-to-market 
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Again, the Basel Committee considers a conservative approach, since they use EPE instead 
of EEPE for defining the denominator of a. 


The calibration of a for a bank portfolio is a difficult task, because it is not easy to 
consider a joint modeling of market and credit risk factors. Let us write the portfolio loss 
as follows: 


n 
L= XŅ_ EAD (t;, Fi,- , Fm) LGD Al {7; < T;} 

i=1 
where F =(F1,..., Fm) are the market risk factors and T = (T1,..., Tn) are the default 
times. Wrong way risk implies to correlate the random vectors F and 7. Given a small 
portfolio with a low number of transactions and counterparty entities, we can simulate the 
portfolio loss and calculate the corresponding a, but this Monte Carlo exercise is unreal- 
istic for a comprehensive bank portfolio. Nevertheless, we can estimate a for more or less 
canonical portfolios. For instance, according to Cespedes et al. (2010), the scaling factor a 
may range from 0.7 to 1.4. When market and credit risks are uncorrelated, œ is close to one. 
a is less than one for general right way risks, while it is larger than one for general wrong 
way risks. However, for realistic market-credit correlations, œ is below 1.2. 


Remark 53 The treatment of specific wrong way risk is different. First, the bank must 
identify all the counterparty entities where specific WWR is significant, and monitor these 
operations. Second, the bank must calculate a conservative EAD figure. 


Remark 54 The modeling of wrong way risk implies to correlate market and credit risk 
factors. The main approach is to specify a copula model. As the dimension of the problem 
is high (m risk factors and n counterparties), Cespedes et al. (2010) propose to consider a 
resampling approach. Another way is to relate the hazard rate of survival functions with the 
value of the contract (Hull and White, 2012). These two approaches will be discussed in the 
next section. 


4.2 Credit valuation adjustment 


CVA is the adjustment to the risk-free (or fair) value of derivative instruments to ac- 
count for counterparty credit risk. Thus, CVA is commonly viewed as the market price of 
CCR. The concept of CVA was popularized after the 2008 Global Financial Crisis, even 
if investments bank started to use CVA in the early 1990s (Litzenberger, 1992; Duffie and 
Huang, 1996). Indeed, during the global financial crisis, banks suffered significant counter- 
party credit risk losses on their OTC derivatives portfolios. However, according to BCBS 
(2010), roughly two-thirds of these losses came from CVA markdowns on derivatives and 
only one-third were due to counterparty defaults. In a similar way, the Financial Service 
Authority concluded that CVA losses were five times larger than CCR losses for UK banks 
during the period 2007-2009. In this context, BCBS (2010) included CVA capital charge in 
the Basel III framework, whereas credit-related adjustments were introduced in the account- 
ing standard IFRS 13 also called Fair Value Measurement?*. Nevertheless, the complexity 
of CVA raises several issues (EBA, 2015a). This is why questions around the CVA are not 
stabilized and new standards are emerging, but they only provide partial answers. 


23IFRS 13 was originally issued in May 2011 and became effective after January 2013. 
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4.2.1 Definition 
4.2.1.1 Difference between CCR and CVA 


In order to understand the credit valuation adjustment, it is important to make the 
distinction between CCR and CVA. CCR is the credit risk of OTC derivatives associated 
to the default of the counterparty, whereas CVA is the market risk of OTC derivatives 
associated to the credit migration of the two counterparties. This means that CCR occurs 
at the default time. On the contrary, CVA impacts the market value of OTC derivatives 
before the default time. 

Let us consider an example with two banks A and B and an OTC contract €. The P&L 
II) of Bank A is equal to: 

Iag = MtM—CVAg 
where MtM is the risk-free mark-to-market value of € and CVAg is the CVA with respect 
to Bank B. We assume that Bank A has traded the same contract with Bank C. It follows 
that: 

Ilaje = MtM—CVAc 
In a world where there is no counterparty credit risk, we have: 


Iag = Hajo = MtM 


If we take into account the counterparty credit risk, the two P&Ls of the same contract are 
different because Bank A does not face the same risk: 


Iag # alc 


In particular, if Bank A wants to close the two exposures, it is obvious that the contact 
€ with the counterparty B has more value than the contact € with the counterparty C 
if the credit risk of B is lower than the credit risk of C. In this context, the notion of 
mark-to-market is complex, because it depends on the credit risk of the counterparties. 


Remark 55 If the bank does not take into account CVA to price its OTC derivatives, it 
does not face CVA risk. This situation is now marginal because of the accounting standards 


IFRS 13. 


4.2.1.2 CVA, DVA and bilateral CVA 


Previously, we have defined the CVA as the market risk related to the credit risk of 
the counterparty. According to EBA (2015a), it should reflect today’s best estimate of the 
potential loss on the OTC derivative due to the default of the counterparty. In a similar way, 
we can define the debit value adjustment (DVA) as the credit-related adjustment capturing 
the entity’s own credit risk. In this case, DVA should reflect the potential gain on the 
OTC derivative due to the entity’s own default. If we consider our previous example, the 
expression of the P&L becomes: 


Ias = MtM +DVA 4 —CVAg 
——— 
Bilateral CVA 


The combination of the two credit-related adjustments is called the bivariate CVA. We then 
obtain the following cases: 


1. if the credit risk of Bank A is lower than the credit risk of Bank B (DVA 4 < CVA»), 
the bilateral CVA of Bank A is negative and reduces the value of the OTC portfolio 
from the perspective of Bank A; 
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2. if the credit risk of Bank A is higher than the credit risk of Bank B (DVA,4 > CVA »B), 
the bilateral CVA of Bank A is positive and increases the value of the OTC portfolio 
from the perspective of Bank A; 


3. if the credit risk of Bank A is equivalent to the credit risk of Bank B, the bilateral 
CVA is equal to zero. 


We notice that the DVA of Bank A is the CVA of Bank A from the perspective of Bank B: 
CVA, = DVA4 


We also have DVA g = CVA B, which implies that the P&L of Bank B is equal to: 


Ipa = —MtM+DVAp—CVAa 
= —MtM+CVAg—DVA,4 
= -Ilag 


We deduce that the P&Ls of Banks A and B are coherent in the bilateral CVA framework 
as in the risk-free MtM framework. This is not true if we only consider the (unilateral or 
one-sided) CVA or DVA adjustment. 


In order to define more precisely CVA and DVA, we introduce the following notations: 


e The positive exposure e* (t) is the maximum between 0 and the risk-free mark-to- 
market: 
et (t) = max (MtM (t) ,0) 


This quantity was previously denoted by e (t) and corresponds to the potential future 
exposure in the CCR framework. 


The negative exposure e” (t) is the difference between the risk-free mark-to-market 
and the positive exposure: 


e~ (t) = MtM (t) — et (t) 
We also have: 


e (t) = —min(MtM (t),0) 
= max(—MtM(t),0) 
The negative exposure is then the equivalent of the positive exposure from the per- 


spective of the counterparty. 


The credit value adjustment is the risk-neutral discounted expected value of the potential 
loss: 


CVA = E® [tire srpek nt 


where T is the maturity of the OTC derivative, Tg is the default time of Bank B and L is 
the counterparty loss: 
L=(1- Rpg): et (Tp) 


Using usual assumptions?4, we obtain: 


CVA = (1 - Rp): | i Bo (t) EpE (t) dF g (t) (4.16) 
0 


24The default time and the discount factor are independent and the recovery rate is constant. 
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where EpE (t) is the risk-neutral discounted expected positive exposure: 
EpE (t) = E® [et (t)] 


and Fp is the cumulative distribution function of Tg. Knowing that the survival function 
Sp (t) is equal to 1 — Fz (t), we deduce that: 


CVA = (1 - Rp): J i — Bo (t) EpE (t) dSz (t) (4.17) 
0 


In a similar way, the debit value adjustment is defined as the risk-neutral discounted 
expected value of the potential gain: 


TA 


DVA = E® jt tra srek Pdt g 


where T4 is the default time of Bank A and: 
G=(1—Ra)-e (Ta) 
Using the same assumptions than previously, it follows that: 
T 
DVA = (1— Ra): | — Bo (t) EnE (t) dS, (t) (4.18) 
0 
where EnE (t) is the risk-neutral discounted expected negative exposure: 
EnE (t) = E® le (t)] 
We deduce that the bilateral CVA is: 
BCVA DVA — CVA 


E 
= (1-Ra): f -Bo (t) EnB (£) 4S4 (8) - 


II 


S N 
(1— Rex): | -Bo (t) EpE (t) dS (2) (4.19) 


When we calculate the bilateral CVA as the difference between the DVA and the CVA, we 
consider that the DVA does not depend on Tg and the CVA does not depend on 74. In the 
more general case, we have: 


BCVA = 


(4.20) 


: -f re dt 
Q| 1{ta < min (T, Tg)}-e Yo -G- 
TB 
1{rTg < min(T,T,4)}- eh mT 


In this case, the calculation of the bilateral CVA requires considering the joint survival 
function of (74, TB). 


Remark 56 If we assume that the yield curve is flat and Sp (t) = e~>8", we have dS xz (t) = 
—dpe*8* dt and: 


T 
CVA = (1- Re): f e`"! EpE (t) Age™>™®* dt 
0 


T 
= sa | e~ ("+A2)t EpE (t) dt 
0 


We notice that the CVA is the product of the CDS spread and the discounted value of the 
expected positive exposure. 
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Example 48 Let us assume that the mark-to-market value is given by: 
MtM (t yen f FGT) BO) ds- N f FOT) B(s) ds 


where N and T are the notional and the maturity of the swap, and f (t, T) is the instanta- 
neous forward rate which follows a geometric Brownian motion: 


df (t, T) = uf (t, T) dt + of (t, T) dW (t) 


We also assume that the yield curve is flat - Bi (s) = e~"- — and the risk-neutral survival 
function is S (t) = e~**. 


Syrkin and Shirazi (2015) show that?°: 


EpE (t) = Nf (0,T) y (t, T) (ero (£ + 57) vi) 6 (£ = Ze) vi)) 


where: 


y (t,T) E 


It follows that the CVA at time t is equal to: 


T 
CVA (t) = Sp | eo +A)(U—-t) BpE (u) du 


t 


We consider the following numerical values: N = 1000, f (0,7) = 5%, u = 2%, o = 25%, 
T = 10 years and Rpg = 50%. In Figure 4.9, we have reported the value of CVA (t) when 
A is respectively equal to 20 and 100 bps. By construction, the CVA is maximum at the 
starting date. 


4.2.1.3 Practical implementation for calculating CVA 
In practice, we calculate CVA and DVA by approximating the integral by a sum: 


CVA = (1—- Rp): EDD Bo (ti) - EpE (t;) - (Sp (ti-1) — Spg (t:)) 


ti <T 


and: 
DVA = (1—Ra): XC Bo (ti) - EnE (ti) (Sa (ti-1) — Sa (t:)) 
ti:<T 
where {t;} is a partition of [0,7]. For the bilateral CVA, the expression (4.20) can be 
evaluated using Monte Carlo methods. 


We notice that the approximation of dS pg (t) is equal to the default probability of Bank 
B between two consecutive trading dates: 


Sp (ti-1)— Sg (ti) Pr {t;i—1 < TB < ti} 


= PDs (ti-1,t) 


and we may wonder what is the best approach for estimating PD x (t;_-1,t;). A straightfor- 
ward solution is to use the default probabilities computed by the internal credit system. 


25See Exercise 4.4.5 on page 303. 
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FIGURE 4.9: CVA of fixed-float swaps 


However, there is a fundamental difference between CCR and CVA. Indeed, CCR is a de- 
fault risk and must then be calculated using the historical probability measure P. On the 
contrary, CVA is a market price, implying that it is valued under the risk-neutral probability 
measure Q. Therefore, PD x (t;-1,t;) is a risk-neutral probability. Using the credit triangle 
relationship, we know that the CDS spread s is related to the intensity A: 

$B (t) = (1 — RB): Az (t) 
We deduce that: 


Sp (t) 


exp (—Az (t) - t) 
5p (t)-t 
a (- a | 


It follows that the risk-neutral probability of default PDg (ti—1, ti) is equal to: 


PDs (tat) = exo ( father} fn) exp ( sett) 


1- Rpg 1- Rpg 


4.2.2 Regulatory capital 


The capital charge for the CVA risk has been introduced by the Basel Committee in 
December 2010 after the Global Financial Crisis. At that moment, banks had the choice 
between two approaches: the advanced method (AM-CVA) and the standardized method 
(SM-CVA). However, the Basel Committee completely changed the CVA framework in 
December 2017 with two new approaches (BA-CVA and SA-CVA) that will replace the 
previous approaches (AM-CVA and SM-CVA) with effect from January 2022. It is the first 
time that the Basel Committee completely flip-flopped within the same accord, since these 
different approaches are all part of the Basel III Accord. 
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4.2.2.1 The 2010 version of Basel IIT 


Advanced method The advanced method (or AM-CVA) can be considered by banks 
that use IMM and VaR models. In this approach, we approximate the integral by the 
middle Riemann sum: 


CVA =LGDz- N 


(= (ti-1) Bo (ti-1) + Bo (ti) EpE (t;) 
ZT 


PDs (tii, ti 
) PDp (teat 


where LGD = 1 — Rp is the risk-neutral loss given default of the counterparty B and 
PDs (ti-1, ti) is the risk neutral probability of default between ¢;_1 and ti: 


PDs (t;-1, t;) = max (exp (- ae tia) — exp (H n) 0) 


We notice that a zero floor is added in order to verify that PD p (t;_-1,t;) > 0. The capital 
charge is then equal to: 


K =3-(CVA+SCVA) 


where CVA is calculated using the last one-year period and SCVA is the stressed CVA based 
on a one-year stressed period of credit spreads. 


Standardized method In the standardized method (or SM-CVA), the capital charge is 
equal to: 


2 
1 * * 3 2 2 
K = 2.33- Vh- (3 > wi: Qui — whdex * Pi Fa a -Q3 (4.21) 


where: ER ar 
1 e000 1 — 7 0.05-M} 
0.05 - M; oS 0.05 - M* 
1 — e7 0-05 Midex 
O* = M* . He Ai ee See 
index index index 0.05 - MŽ 


index 


In this formula, h is the time horizon (one year), w; is the weight of the itè counterparty 
based on its rating, M; is the effective maturity of the it" netting set, EAD, is the exposure 
at default of the it” netting set, M* is the maturity adjustment factor for the single name 
hedge, H% is the hedging notional of the single name hedge, w#,q,, is the weight of the 


index hedge, Mi,4., is the maturity adjustment factor for the index hedge and H% dex is 


the hedging notional of the index hedge. In this formula, EAD; corresponds to the CCR 
exposure at default calculated with the CEM or IMM approaches. 


Remark 57 We notice that the Basel Committee recognizes credit hedges (single-name 
CDS, contingent CDS and CDS indices) for reducing CVA volatility. If there is no hedge, 
we obtain: 


2 
K =2.33-Vh- i (Feanen) a -M?. EAD? 


The derivation of Equation (4.21) is explained in Pykhtin (2012). We consider a Gaus- 
sian random vector X = (X1,...,Xn) with X; ~ N (0,02). We assume that the random 
variables X1,..., Xn follow a single risk factor model such that the correlation p (X;, X;) 
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is constant and equal to p. We consider another random variable Xn41 ~ M (0, o2 tai) such 
that p(X;,Xn+1) is also constant and equal to pn+1. Let Y be the random variable defined 
as the sum of X;’s minus Xy41: 


n 


Y= Xi- Xn 


i=1 


It follows that Y ~ N (0, o2) where: 


n n i n 
2 2 2 
oy = > o; + 2p > > C405 — 2Pn410n41 > Oi + ons 
i=l i=l 


i=1 j=1 


We finally deduce that: 


n n i n 
Fy" (a) = #7! (a) b> o? + 2p >" 5 O40; — 2Pn410n41 5 Oi + OZ 
i=l {=l 


i=1 j=1 


Equation (4.21) is obtained by setting o; = wiQi, On+1 = Wh dex Vhndex P = 25%, Pn+1 = 
50% and a = 99%. This means that X; is the CVA net exposure of the it netting set 
(including individual hedges) and X,,+41 is the macro hedge of the CVA based on credit 
indices. 


4.2.2.2 The 2017 version of Basel ITI 


There are now two approaches available for calculating CVA risk: the basic approach 
(BA-CVA) and the standardized approach (SA-CVA). However, if the bank has a few expo- 
sure on counterparty credit risk?°, it may choose to set its CVA capital requirement equal 
to its CCR capital requirement. 


Basic approach Under the basic approach, the capital requirement is equal to: 
K= B N KEeduced i (1 A B) . pKHedsed 


where JCReduced and cHedsed are the capital requirements without and with hedging recog- 
nition. The reduced version of the BA-CVA is obtained by setting 8 to 100%. A bank that 
actively hedges CVA risks may choose the full version of the BA-CVA. In this case, ( is set 
to 25%. 


For the reduced version, we have: 


2 


7cReduced _ p- X SCVA,; + (1 — p?)- y SCVAS 
j j 


where p = 50% and SCVA, is the CVA capital requirement for the j*® counterparty: 
1 
SCVA; = = - RW; ree -Mk 


In this formula, a is set to 1.4, RW; is the risk weight for counterparty j, k is the netting 
set, DF; is the discount factor, EAD, is the CCR exposure at default and Mx is the effective 


26The materiality threshold is €100 bn for the notional amount of non-centrally cleared derivatives. 
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maturity. These last three quantities are calculated at the netting set level. If the bank use 
the IMM to calculate the exposure at default, DF; is equal to one, otherwise we have: 


t= e7 0-05-Mx 


DF, = 
0.05 - Mx 


RW, depends on the credit quality of the counterparty (IG/HY) and its sector and is given 
in Table 4.7. 


TABLE 4.7: Supervisory risk weights (BA-CVA) 


Credit quality 


Sector IG HY/NR 
Sovereign 0.5% 3.0% 
Local government 1.0% 4.0% 
Financial 5.0% 12.0% 
Basic material, energy, industrial, agriculture, man- 3.0% 7.0% 
ufacturing, mining and quarrying 
Consumer goods and services, transportation and 

as f , bodys 3.0% 8.5% 
storage, administrative and support service activities 
Technology, telecommunication 2.0% 5.5% 
Health care, utilities, professional and technical ac- 15% 5.0% 
tivities 
Other sector 5.0% 12.0% 


Source: BCBS (2017c). 


The full version of the BA-CVA recognizes eligible hedging transactions that are used for 
mitigating the credit spread component of the CVA risk. They correspond to single-name 
CDS and index CDS transactions. #484 depends on three components: 


JcHedged = ly of Ko ae K; 


According to BCBS (2017c), the first term aggregates the systematic components of the 


CVA risk: 


Kı = | p- X_ (SCVA; — SNH;) — IH 


J 


where SNH; is the CVA reduction for counterparty j due to single-name hedging and 
IH is the global CVA reduction due to index hedging. The second term aggregates the 
idiosyncratic components of the CVA risk: 


K = (1- ø) - > (SCVA; — SNH,)” 
J 
Finally, the third term corresponds to the hedging misalignment risk because of the mis- 
match between indirect hedges and single-name hedges: 


K; = 5 HMA, 
j 
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The single-name hedge SNH; is calculated as follows: 


SNH; = X. eng - (RWn - DFh Np + Mn) 
hej 


where h represents the single-name CDS transaction, 0;,; is the supervisory correlation, 
DF, is the discount factor?’, Nn is the notional and M, is the remaining maturity. These 
quantities are calculated at the single-name CDS level. The correlation op, į; between the 
credit spread of the counterparty and the credit spread of the CDS can take three values: 
100% if CDS h directly refers to counterparty j, 80% if CDS h has a legal relation with 
counterparty 7, and 50% if CDS h and counterparty j are of the same sector and region. 
For the index hedge IH, we have a similar formula: 


TH = XC RWy - DFw Nw -Mw 
h’ 


where h’ represents the index CDS transaction. The other quantities RW», DFw, Nw and 
Mx are defined exactly as previously except that they are applied at the index CDS level. 
For the risk weight, its value is the weighted average of risk weights of RW: 


RW» =0.7- X` wj- RW; 


JER 


where wyj is the weight of the counterparty/sector j in the index CDS h’. We notice that 
this formula reduces to RW» = 0.7- RW; when we consider a sector-specific index. Finally, 
we have 
HMA; = X (1 — 97,3)  (RWn  DFr Np Mn)” 
hej 


Remark 58 In the case where there is no hedge, we have SNH; = 0, HMA; = 0, IH = 0, 


and K = KReduced | Tf there is no hedging misalignment risk and no index CDS hedging, we 
have: 


2 
K= pS Kj +0- e) Ae 
J J 
where K; = SCVA,; — SNH, is the single-name capital requirement for counterparty j. 


Example 49 We assume that the bank has three financial counterparties A, B and C, that 
are respectively rated IG, IG and HY. There are 4 OTC transactions, whose characteristics 
are the following: 


Transactionk 1 2 3 4 
Counterparty A A B C 
EAD, 100 50 70 20 
Mk 1 1 05 0.5 


In order to reduce the counterparty credit risk, the bank has purchased a CDS protection 
on A for an amount of $75 mn, a CDS protection on B for an amount of $10 mn and 
a HY Financial CDX for an amount of $10 mn. The maturity of hedges exactly matches 
the maturity of transactions. However, the CDS protection on B is indirect, because the 
underlying name is not B, but B' which is the parent company of B. 


27 We have: 
1 — e7005 Mp 


DF, = 
0.05- Mr 


where Mp is the remaining maturity. 
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We first begin to calculate the discount factors DF; for the four transactions. We obtain 
DF, = DF, = 0.9754 and DF; = DF, = 0.9876. Then we calculate the single-name capital 
for each counterparty. For example, we have: 


1 
SCVA4 = — X RW a x (DF, x EAD, xMı + DF2 x EAD» xM2) 
Q 


1 
= Ty %5% x (0.9754 x 100 x 1 + 0.9754 x 50 x 1) 


= 5.225 


We also find that SCVA g = 1.235 and SCVAc = 0.847. It follows that `; SCVA; = 7.306 
and `` j SCVA4 = 29.546. The capital requirement without hedging is equal to: 


jcReduced _ V0.5 x 7.306)? + (1 — 0.52) x 29.546 = 5.959 


We notice that it is lower than the sum of individual capital charges. In order to take into 
account the hedging effect, we calculate the single-name hedge parameters: 


SNH 4 = 5% x 100% x 0.9754 x 75 x 1 = 3.658 


and: 
SNH pg = 5% x 80% x 0.9876 x 10 x 0.5 = 0.198 


Since the CDS protection is on B’ and not B, there is a hedging misalignment risk: 
HMA p = 0.05 x (1 — 0.807) x (0.9876 x 10 x 0.5)” = 0.022 
For the CDX protection, we have: 
IH = (0.7 x 12%) x 0.9876 x 10 x 0.5 = 0.415 


Then, we obtain Kı = 1.718, Ky = 3.187, K = 0.022 and KHedse¢d — 2,220. Finally, the 
capital requirement is equal to $3.154 mn: 


K = 0.25 x 5.959 + 0.75 x 2.220 = 3.154 


Standardized approach The standardized approach for CVA follows the same principles 
than the standardized approach SA-TB for the market risk of the trading book. The main 
difference is that SA-CVA is only based on delta and vega risks, and does not include 
curvature, jump-to-default and residual risks: 


K= KPelta E KVesa 


For computing the capital charge, we first consider two portfolios: the CVA portfolio and the 
hedging portfolio. For each risk (delta and vega), we calculate the weighted CVA sensitivity 
of each risk factor F;: 

CVA _ oCVA 


and: 
Hedge _ qHedge ; 
WS; = 8; - RW; 
where S; and RW; are the net sensitivity of the CVA or hedging portfolio with respect to 
the risk factor and the risk weight of Fj. Then, we aggregate the weighted sensitivity in 
order to obtain a net figure: 


WS; = WS,“ + wsifedse 
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Second, we calculate the capital requirement for the risk bucket By: 


Ke, = |= WS; + >) psg WSj- WSy +1% À (wages) 
j 


VAI J 
where F; € By. Finally, we aggregate the different buckets for a given risk class: 


kc Delta/Vega = MCVA ` Jè Kz, + 5 Yk,k' * Keg, . Keg, 
k k'£k 


where meva = 1.25 is the multiplier factor. As in the case of SA-TB, SA-CVA is then based 
on the following set of parameters: the sensitivities S; of the risk factors that are calculated 
by the bank; the risk weights RW, of the risk factors; the correlation pj j; between risk 
factors within a bucket; the correlation yg, x between the risk buckets. The values of these 
parameters are not necessarily equal to those of SA-TB?*. For instance, the correlations pj, ;/ 
and yz, are generally lower. The reason is that these correlations reflect the dependence 
between credit risk factors and not market risk factors. 


Remark 59 Contrary to the SA-TB, the bank must have the approval of the supervisory 
authority to use the SA-CVA. Otherwise, it must use the BA-C'VA framework. 


4.2.3. CVA and wrong/right way risk 


The wrong way or right way risk is certainly the big challenge when modeling CVA. 
We have already illustrated this point in the case of the CCR capital requirement, but this 
is even more relevant when computing the CVA capital requirement. The reason is that 
the bank generally manages the CVA risk because it represents a huge cost in terms of 
regulatory capital and it impacts on a daily basis the P&L of the trading book. For that, 
the bank generally puts in place a CVA trading desk, whose objective is to mitigate CVA 
risks. Therefore, the CVA desk must develop a fine modeling of WWR/RWR risks in order 
to be efficient and to be sure that the hedging portfolio does not create itself another source 
of hidden wrong way risk. This is why the CVA modeling is relatively complex, because we 
cannot assume in practice that market and credit risks are not correlated. 


We reiterate that the definition of the CVA is?’: 


CVA = fits Tj IM.) et (x) 


where et (t) = max(w,0) and w is the random variable that represents the mark-to- 
market®”. If we assume that the recovery rate is constant and interest rates are deterministic, 
we obtain: 


T +00 
CVA = (1-R). if Bo (t) max (w, 0) dF (w, t) 
0 


—co 


T +00 
= a-R). f Bo (t) max (w, 0) dC (F,, (w) , F+ (#)) 


—co 


?8See BCBS (2017c) on pages 119-127. 

29In order to obtain more concise formulas, we delete the reference to the counterparty B and we write 
R instead of Rpg. 

30We implicitly assume that the mark-to-market is a stationary process. In fact, this assumption is not 
verified. However, we use this simplification to illustrate how the dependence between the counterparty 
exposure and the default times changes the CVA figure. 
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where F (w,t) is the joint distribution of the mark-to-market and the default time and C is 
the copula between w and R. If we assume that C = C+, we retrieve the traditional CVA 
formula’! 


CVA = (1-R)- T [- Bo (t) max (w, 0) dF, (w) dF, (t) 
0 —oo 


T 
= (L-R): f Bo() EB dF, (+) 


where EpE (t) is the expected positive exposure: 


+00 
EpE (t) = J max (w,0) dF, (w) = E [e* (t)] 


—co 


Otherwise, we have to model the dependence between the mark-to-market and the default 
time. In what follows, we consider two approaches: the copula model introduced by Cespedes 
et al. (2010) and the hazard rate model of Hull and White (2012). 


The copula approach The Monte Carlo CVA is calculated as following: 


CVA = R) X` Bo (t Je 5 ef caw) (F; (ti) — F+ (ti—1)) 


Hsr s=1 


where e+ (t;;ws) is the counterparty exposure of the s*” simulated scenario ws and ng is 
the number of simulations. If market and credit risk factors are correlated, the Monte Carlo 
CVA becomes: 


CVA = R) X` a Bo (ti) ef (ti; Ws) Tsi (4.22) 
ti<T s=1 
where’? 
Tsi = Pr {w = ws, ti < T < ti} 

The objective is then to calculate the joint probability by assuming a copula function C 
between w and 7. For that, we assume that the scenarios ws are ordered. Let U = F,, (w) 
and V = F, (T) be the integral transform of w and 7. Since U and V are uniform random 
variables, we obtain: 


Tsi = Pr{ws-1 <w <u, ti <7 < ti} 
Pr {us_1 <U <u vj-1<V< vi} 
C (us, vi) — C (us—1, vi) — C (us, vi—1) + C (us—1, Vi-1) (4.23) 


Generally, we don’t know the analytical expression of F,,. This is why we replace it by the 
empirical distribution F,, where the probability of each scenario is equal to 1/ng. 

In order to define the copula function C, Rosen and Saunders (2012) consider a market- 
credit version of the Basel model. Let Zm = ®~' (F„ (w)) and Ze = ®-!(F,(r)) be the 


31See Equation (4.16) on page 280. 
32In the case where w and T are independent, we retrieve the previous formula because we have: 


Tsi = Pr{w=ws}-Pr{tii<7< ti} 
F- (ti) — Fr (ti-1) 
ng 
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normalized latent random variables for market and credit risks. Rosen and Saunders use 
the one-factor model specification: 


Zm = PmX + (lap Em 
Y= p,X 44/1 — pee 


where X is the systematic risk factor that impacts both market and credit risks, €m and 
Ec are the idiosyncratic market and credit risk factors, and pm and pe are the market and 
credit correlations with the common risk factor. It follows that the market-credit correlation 
is equal to: 


Pm,c = T |Zm Ze] = PmPc 


We deduce that the dependence between Zm and Ze is a Normal copula with parameter 
Pm,c = PmfPc; and we can write: 


Zm = Pm,cLc + \/ 1— Pac Em,c 


where €m, ~ N (0,1) is an independent specific risk factor. Since the expression of the 
Normal copula is C (u, V; Pm,c) = 2 (B71 (u) , ®7! (v) ; Pm,c), Equation (4.23) becomes*®: 


ma = (0 (2) 07, (6) ome) - 
n(o (i) 00m) 
b, G (=) „8-1 (F+ (ti—1)) ‘Pme) + 


This approach is called the ordered-scenario copula model (OSC), because it is based on 
the ordering trick of the scenarios w,. Rosen and Saunders (2012) also propose different 
versions of the CVA discretization leading to different expressions of Equation (4.22). For 
instance, if we assume that the default occurs exactly at time t; and not in the interval 
[ti_1, til, we have: 


Tsi © Tsji -Pr{ti <T< ti} 
and: 
Tai = Pr{w=ws]| T= ti} 
= Pr{ws-1<w<w,|7T=t;} 
= Pr{us-1 <U <u, |V =v} 
2C (Us, vi) = 2C (us—1, Vi) 


l dC (AF #)) -8C (=r, #)) 


In the case of the Rosen-Saunders model, we use the expression of the conditional Normal 
copula given on page 737: 


-t (u) a Pact 7! (v) 


\/ 1 = phe 


33By definition, we have F3! (ws) = s/ng because the scenarios are ordered. 


2C (u, v; Pm,c) =@® 
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The hazard rate approach In Basel II, wrong way risk is addressed by introducing the 
multiplier a = 1.4, which is equivalent to change the values of the mark-to-market. In the 
Rosen-Saunders model, wrong way risk is modeled by changing the joint probability of the 
mark-to-market and the default times. Hull and White (2012) propose a third approach, 
which consists in changing the values of the default probabilities. They consider that the 
hazard rate is a deterministic function of the mark-to-market: A(t) = A(t, MtM (t)). For 
instance, they use two models: 


A (t, MtM (t)) = et) +> MtM() (4.24) 


and: 
A (t, MtM (t)) = In (1 j area) (4.25) 


The case b < 0 corresponds to the right way risk, whereas b > 0 corresponds to the wrong 
way risk. When b = 0, the counterparty exposure is independent from the credit risk of the 
counterparty. 


Hull and White (2012) propose a two-step procedure to calibrate a (t) and b. First, they 


assume that the term structure of the hazard rate is flat. Given two pairs (MtMj, $1) and 
(MtMp, 52), a (0) and b satisfy the following system of equations: 


(1 E R) -AÀ (0, MtMı) = $i 

{ (1 = R) 5 A (0, MtM2) = 8 

The solution is: 
In Ag — Indy 
~ MtM2—MtM, 
a (0) = In Aq —b- MtM, 

where A; = 5;/(1— R) for Model (4.24) and A; = exp (s;/(1 — R)) — 1 for Model (4.25). 
Hull and White (2012) consider the following example. They assume that the 5Y CDS 
spread of the counterparty is 300 bps when the mark-to-market is $3 mn, and 600 bps when 
the mark-to-market is $20 mn. If the recovery rate is set to 40%, the calibrated parameters 
are a(0) = —3.1181 and b = 0.0408 for Model (4.24) and a (0) = —3.0974 and b = 0.0423 
for Model (4.25). The second step of the procedure consists in calibrating the function a (t) 
given the value of b estimated at the first step. Since we have: 


siar JÉ X(s,.MtM(s)) ds 


S(t) = exp (25) 


the function a (t) must verify that the survival probability calculated with the model is 
equal to the survival probability calculated with the credit spread: 


= ng Atk MtM(tr)) (tk—tr-1) an (-5 (ti) “) 


and: 


1-R 


In the case where the CVA is calculated with the Monte Carlo method, we have: 


L CATT —Attasos E T a) s (ti) + ti 
ae k k r—te—-1) — exp =e 


s=1k=0 
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where ws (tx) is the st” simulated value of MtM (tx). Therefore, a(t) is specified as a piece- 
wise linear function and we use the bootstrap method! for calibrating a(t) given the 
available market CDS spreads®°. 


4.3 Collateral risk 
4.3.1 Definition 


When there is a margin agreement, the counterparty needs to post collateral and the 
exposure at default becomes: 


et (t) = max (MtM (t) — C (t) ,0) (4.26) 


where C (t) is the collateral value at time t. Generally, the collateral transfer occurs when 
the mark-to-market exceeds a threshold H: 


C (t) = max (MtM (t — 6c) — H,0) (4.27) 


H is the minimum collateral transfer amount whereas 6c > 0 is the margin period of risk 
(MPOR)). According to the Financial Conduct Authority (FCA), the margin period of risk 
“stands for the time period from the most recent exchange of collateral covering a netting 
set of financial instruments with a defaulting counterparty until the financial instruments 
are closed out and the resulting market risk is re-hedged”. It can be seen as the necessary 
time period for posting the collateral. In many models, dg is set to zero in order to obtain 
analytical formulas. However, this is not realistic from a practical point of view. From a 
regulatory point of view, dc is generally set to five or ten days (Cont, 2018). 


If we combine Equations (4.26) and (4.27), it follows that: 


et (t) = max (MtM (t) — max (MtM (t — ĉc) — H,0) ,0) 
MtM (t)- 1 {0 < MtM (t), MtM (t — ôc) < H} + 
(MtM (t) — MtM (t — 6c) + H)- 

1{H < MtM (t — ðc) < MtM (t) + H} 


We obtain some special cases: 
e When H = +00, C (t) is equal to zero and we obtain: 
et (t) = max (MtM (t) ,0) 


e When H = 0, the collateral C (t) is equal to MtM (t — ðc) and the counterparty 
exposure becomes: 


et (t) = max(MtM (t) —MtM (t — ôc),0) 
= max(MtM (t — ôc,t),0) 


The counterparty credit risk corresponds to the variation of the mark-to-market 
MtM (t — 6c, t) during the liquidation period [t — dc, t]. 


34This method is presented on page 204. 
35 Generally, they correspond to the following maturities: 1Y, 3Y, 5Y, 7Y and 10Y. 
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e When 6c is set to zero, we deduce that: 


et (t) = max (MtM (t) — max (MtM (t) — H,0),0) 
MtM (t) - 1 {0 < MtM(t) < H}+H-1{H < MtM(t)} 


e When dc is set to zero and there is no minimum collateral transfer amount, the 
counterparty credit risk vanishes: 


et (t) =0 


This last case is interesting, because it gives an indication how to reduce the counterparty 
risk: 
H\,0or 6¢ \,0=> et (t) \0 


In the first panel in Figure 4.10, we have simulated the mark-to-market of a portfolio for 
a two-year period. In the second panel, we have reported the counterparty exposure when 
there is no collateral. The other panels show the collateral C (t) and the counterparty 
exposure e* (t) for different values of dc and H. When there is no margin period of risk, we 
verify that the exposure is capped at the collateral threshold H in the fourth panel. When 
the threshold is equal to zero, the counterparty exposure corresponds to the lag effect due to 
the margin period of risk as illustrated in the sixth panel. The riskier situation corresponds 
to the combination of the threshold risk and the margin period of risk (eighth panel). 
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FIGURE 4.10: Impact of collateral on the counterparty exposure 


4.3.2 Capital allocation 


Taking into account collateral in the CVA computation is relatively straightforward 
when we use Monte Carlo simulations. In fact, the CVA formula remains the same, only the 
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computation of the expected positive exposure EpE (t) is changed. However, as mentioned 
by Pykhtin and Rosen (2010), the big issue is the allocation of the capital. In Section 2.3 on 
page 104, we have seen that the capital allocation is given by the Euler allocation principle. 
Let R (w) be the risk measure of Portfolio w = (w1,..., Wn). Under some assumptions, we 
reiterate that: 


where RC; is the risk contribution of the it? component: 


OR (w) 
0 Wi 
The components can be assets, credits, trading desks, etc. For instance, in the case of credit 


risk, the IRB formula gives the risk contribution of a loan within a portfolio. In the case of 
a CVA portfolio, we have: 


T 
CVA (w) = (1 — Rp) - f -Bo (t) EpE (t; w) dS» (t) 


where EpE (t; w) is the expected positive exposure with respect to the portfolio w. The 
Euler allocation principle becomes: 


CVA (w) = X CVA; (w) 
i=1 
where CVA; (w) is the CVA risk contribution of the it component: 
T 
CVA; (w) = (1- Re) J — Bo (t) EpE; (t; w) dS g (t) 
0 


and EpE, (t; w) is the EpE risk contribution of the it component: 
-o EpE (t; w) 
’ ð Wi 


Therefore, the difficulty for computing the CVA risk contribution is to compute the EpE 
risk contribution. 


EpE; (t; w) = w 


We consider the portfolio w = (w1,..., Wn), which is composed of n OTC contracts. 
The mark-to-market of the portfolio is equal: 


where MtM; (t) is the mark-to-market for the contract €;. In the general case, the counter- 
party exposure is given by: 

et (t) = MtM (t) - 1 {0 < MtM (t) < H} + H-1{MtM(t) > H} 
If there is no collateral, we have: 


et (t) = MtM(t)-1{MtM(t) > 0} 


= 5 wi: MtM; (t) - 1 {MtM (t) > 0} 
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We deduce that: 


~ = E [MtM; (¢) - 1 {MtM (t) > 0}] 


and: 


EpE, (t; w) = E [w; - MtM; (t)- 1{MtM (t) > 0} 


Computing the EpE (or CVA) risk contribution is then straightforward in this case. In 
the general case, Pykhtin and Rosen (2010) notice that EpE (t; w) is not a homogeneous 
function of degree one because of the second term E [H - 1{MtM (t) > H}]. The idea of 
these authors is then to allocate the threshold risk to the individual contracts: 


[H -1 {MtM (t) > HY =H. 


[ws L{MtM (t) > HYH 


n 
=1 


i 


by choosing an appropriate value of w; such that D wi = 1. They consider two proposi- 
tions. Type A Euler allocation is given by: 


EpE, (t;w) = E[w;: MtM; (t)-1{0< MtM (t) < HY + 
l [1 {MtM (t) > H}] - E [w; - MtM; (t) - 1 {MtM (t) > H} 
l [MtM (t) - 1 {MtM (t) > HY 


H. 


whereas type B Euler allocation is given by: 


RC; = E[w;-MtM; (t) - 1 {0 < MtM (t) < H} + 
Wi MtM; (t) 
MtM (0) 


H-E 


. L {MtM (t) > H} 


Pykhtin and Rosen (2010) consider the Gaussian case when the mark-to-market for the 
contract €; is given by: 


where (X1,..., Xn) ~ N (0n, p) and p = (pij) is the correlation matrix. Let Hw (t) and 
Ow (t) be the expected value and volatility of the portfolio mark-to-market MtM (t). The 
authors show that? the expected positive exposure is the sum of three components: 


EpE (t; w) = EpE,, (t; w) + EpE, (t; w) + EpE y (t; w) 


where EpE,, (t; w) is the mean component: 


csr (B)-«() 


EpE, (t; w) is the volatility component: 


ET E) 


and EpE, (t; w) is the collateral threshold component: 


EpEy (t;w) = H- ® (0#) 


36See Exercise 4.4.6 on page 303. 
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We notice that EpE,, (t; w) and EpE, (t; w) are always positive, while EpE, (t; w) may be 
positive or negative. When there is no collateral agreement, EpE y (t; w) is equal to zero and 
EpE (t; w) depends on the ratio Hw (t) /ow (t). Concerning the risk contributions, Pykhtin 
and Rosen (2010) obtain a similar decomposition: 


EpE; (t; w) = EpE,, ; (t; w) + EpE,; (t; w) + EpE p ; (t; w) 


where: 


Ow (t) Ow 
EpEg;(t;w) = H o( ae ) Pa 


wosa Eau Par a and: 


wm (00 (EOE) tun a o (OF) 


Ow 


i m (9:0 (EOE) ooo ( MO) 


Ow Ow (t) 


Example 50 We consider a portfolio of two contracts €, and €2 with the following char- 
acteristics: mı (t) = $1 mn, cı (t) = $1 mn, po (t) = $1 mn, o2 (t) = $1 mn and p12 = 0%. 


We first calculate the expected positive exposure EpE (t; w) when we change the value 
of uə (t) and there is no collateral agreement. Results are given in Figure 4.11. In the first 
panel, we observe that EpE (t; w) increases with respect to us (t). We notice that the mean 
component is the most important contributor when the expected value of the portfolio 
mark-to-market is high and positive”: 


Ww (t) { EpE,, (t; w) > EpE (t; w) 


EpE, (t; w) > 0 


The risk contribution EpE, (t; w) and EpE, (t; w) are given in the second panel in Figure 
4.11. The risk contribution of the second contract is negative when /12 (t) is less than —1. This 
illustrate the diversification effect, implying that some trades can negatively contributes to 
the CVA risk. This is why the concept of netting sets is important when computing the 
CVA capital charge. In Figure 4.12, we have done the same exercise when we consider 
different values of the correlation p1,2. We observe that the impact of this parameter is not 
very important except when the correlation is negative. The reason is that the correlation 
matrix has an impact on the volatility ow (t) of the portfolio mark-to-market, but not on 
the expected value pu (t). We now consider that pu: (t) = pug (t) = 1, c1 (t) = og (t) = 1 and 
p1ı,2 = 0. In Figure 4.13, we analyze the impact of the collateral threshold H. We notice 
that having a tighter collateral agreement (or a lower threshold H) allows to reduce the 
counterparty exposure. However, this reduction is not monotonous. It is very important 
when H is close to zero, but there is no impact when H is large. 


37In this limit case, we obtain: 


EpE (t; w) = pw (t) = X win (t) 
l 
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Counterparty exposure Risk contribution (in 7) 
5 200 
— EpE(t;w) 4 
-- EpE,(tiw) 1 
=- EpE,(tiw) 3 -=-= 


Portfolio mean Portfolio 
2.0 


4 
u(t) 


FIGURE 4.11: Impact of u; (t) /o; (t) on the counterparty exposure 


P12 = 507 p12 = 1007 


u(t) 


FIGURE 4.12: Impact of the correlation on the counterparty exposure 
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FIGURE 4.13: Decomposition of the counterparty exposure when there is a collateral 
agreement 
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FIGURE 4.14: Optimal collateral threshold 
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The impact of the threshold can be measured by the ratio: 

_ EpE (t; w, co) — EpE (t; w, H) 

7 EpE (t; w, 00) 

where EpE (t; w, H) is the expected positive exposure for a given threshold H. If we would 
like to reduce the counterparty exposure by 6*, we have to solve the non-linear equation 
6 (H) = o* in order to find the optimal value H*. We can also approximate EpE (t; w, H) 
by its mean contribution: 


ò (H) 


5(H) ~ ô, (H) 


Hw (t) “Q (Cx) 
EpE,, (t; w, 00) 


In this case, the solution of the non-linear equation ô, (H) = 6* is equal to®®: 
EpE,, (t; w, 00) x) 
Hw (t) 
The computation of H* is then straightforward since we have only to calculate Hw (t), Cw (t) 
and the mean contribution EpE,, (t; w, o0) when there is no collateral agreement. However, 
the value of H* is overestimated because EpE,, (t; w, H) is lower than EpE (t; w, H). A rule 
of thumb is then to adjust the solution H* by a factor®®, which is generally equal to 0.75. 


In Figure 4.14, we have represented the optimal collateral threshold H* for the previous 
example. 


H* = py (t) — ow 9:0 ( 


4.4 Exercises 
4.4.1 Impact of netting agreements in counterparty credit risk 


The table below gives the current mark-to-market of 7 OTC contracts between Bank A 
and Bank B: 


Equity Fixed income FX 
& & C| č C5 Cs € 
A| +10 -5 +6 | +17 —5 —5 41 
B|- +6 -3) -12 +9 +5 +1 


The table should be read as follows: Bank A has a mark-to-market equal to +10 for the 
contract €; whereas Bank B has a mark-to-market equal to —11 for the same contract, 
Bank A has a mark-to-market equal to —5 for the contract €2 whereas Bank B has a 
mark-to-market equal to +6 for the same contract, etc. 


1. (a) Explain why there are differences between the MtM values of a same OTC con- 
tract. 


(b) Calculate the exposure at default of Bank A. 
(c) Same question if there is a global netting agreement. 
(d) Same question if the netting agreement only concerns equity products. 


38The solution H* can be viewed as a quantile of the probability distribution of the portfolio mark-to- 
market: MtM (t) ~ N (pw (t), o2, (t)). 
39The underlying idea is that EpE,, (t; w, H) ~ 75% - EpE (t; w, H). 
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2. In the following, we measure the impact of netting agreements on the exposure at 
default. 


(a) 


(c) 
(a) 


We consider an OTC contract € between Bank A and Bank B. The mark-to- 
market MtM; (t) of Bank A for the contract € is defined as follows: 


MtM, (t) = gi + oiWı (t) 


where W; (t) is a Brownian motion. Calculate the potential future exposure of 
Bank A. 


We consider a second OTC contract between Bank A and Bank B. The mark- 
to-market is also given by the following expression: 


MtMo (t) = %o+ 02W2 (t) 


where Wə (t) is a second Brownian motion that is correlated with W1 (t). Let 
p be this correlation such that E [W; (t) W2 (t)] = pt. Calculate the expected 
exposure of bank A if there is no netting agreement. 


Same question when there is a global netting agreement between Bank A and 
Bank B. 


Comment on these results. 


4.4.2 Calculation of the effective expected positive exposure 


We denote by e (t) the potential future exposure of an OTC contract with maturity T. 
The current date is set to t = 0. 


1. Define the concepts of peak exposure PE, (t), maximum peak exposure MPE, (0; t), 
expected exposure EE (t), expected positive exposure EPE (0; t), effective expected 
exposure EEE (t) and effective expected positive exposure EEPE (0; t). 


2. Calculate these different quantities when the potential future exposure is e (t) = o- 
Vt - X where X ~ Uio,1]- 


3. Same question when e (t) = exp (a: vt- X) where X ~ N (0,1). 


4. Same question when e (t) = o - (t? — {Tt + $T°t) - X where X ~ Ujo,- 


5. Same question when e (t) = ø : vt - X where X is a random variable defined on [0,1] 
with the following probability density function*®: 


6. Comment on these results. 


40We assume that a > 0. 
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4.4.3 Calculation of the capital charge for counterparty credit risk 


We denote by e(t) the potential future exposure of an OTC contract with maturity 
T. The current date is set to t = 0. Let N and o be the notional and the volatility of the 
underlying contract. We assume that e (t) = N-o- vt: X where 0 < X <1, Pr{X < r} =x 
and y > 0. 


1. Calculate the peak exposure PEg (t), the expected exposure EE (t) and the effective 
expected positive exposure EEPE (0; t). 


2. The bank manages the credit risk with the foundation IRB approach and the coun- 
terparty credit risk with an internal model. We consider an OTC contract with the 
following parameters: N is equal to $3 mn, the maturity T is one year, the volatility 
a is set to 20% and y is estimated at 2. 


(a) Calculate the exposure at default EAD knowing that the bank uses the regulatory 
value for the parameter a. 

(b) The default probability of the counterparty is estimated at 1%. Calculate then 
the capital charge for counterparty credit risk of this OTC contract*!. 


4.4.4 Calculation of CVA and DVA measures 


We consider an OTC contract with maturity T between Bank A and Bank B. We denote 
by MtM (t) the risk-free mark-to-market of Bank A. The current date is set to t = 0 and 
we assume that: 

MtM (t) = N-o- vt: X 
where N is the notional of the OTC contract, ø is the volatility of the underlying asset and 
X is a random variable, which is defined on the support [—1, 1] and whose density function 
is: 
1 
f(z) = 3 
1. Define the concept of positive exposure e* (t). Show that the cumulative distribution 
function Fjo, of e+ (t) has the following expression: 


Foy (2) =1{0<2<ovi} (3+ z ) 


2 2-N-a-vt 
where Fjo) (£) = 0 if x < 0 and Fjo, (£) = 1 if x > ovt. 
2. Deduce the value of the expected positive exposure EpE (t). 


3. We note Rp the fixed and constant recovery rate of Bank B. Give the mathematical 
expression of the CVA. 


4. By using the definition of the lower incomplete gamma function y (s, x), show that 
the CVA is equal to: 


N-(1-Rep)-o-7(3,ABT) 

4y/NB 
when the default time of Bank B is exponential with parameter pg and interest rates 
are equal to zero. 


CVA = 


41 We will take a value of 70% for the LGD parameter and a value of 20% for the default correlation. We 
can also use the approximations —1.06  —1 and ®(—1) ~ 16%. 


Counterparty Credit Risk and Collateral Risk 303 


5. Comment on this result. 


6. By assuming that the default time of Bank A is exponential with parameter A4, 
deduce the value of the DVA without additional computations. 


4.4.5 Approximation of the CVA for an interest rate swap 
This exercise is based on the results of Syrkin and Shirazi (2015). 


1. Calculate EpE (t) = E[max(MtM (t),0)] when the mark-to-market is equal to 
MtM (t) = Ae* — B and X ~ N (ux, 0%). 


2. We define the mark-to-market of the interest rate swap as follows: 


MMO SN f 0.2) Be) ds-N f $0.7) B(s) ds 


where N and T are the notional and the maturity of the swap, and f (t,T) is the 
instantaneous forward rate. Comment on this formulation. By assuming that f (t, T) 
follows a geometric Brownian motion: 


df (t, T) = pf (t, T) dt + of (t, T) aW (t) 


and the yield curve is flat — B; (s) = e™”(9=®, calculate the value of the mark-to- 
market. Deduce the confidence interval of MtM (t) with a confidence level a. 


3. Calculate the expected mark-to-market and the expected counterparty exposure. 


4. Give the expression of the CVA at time t if we assume that the default time is 
exponentially distributed: T ~ E (A). 


5. Retrieve the approximation of the CVA found by Syrkin and Shirazi (2015). 


6. We consider the following numerical values: N = 1000, f(0,T) = 5%, u = 2%, 
o = 25%, T = 10 years, A = 1% and R = 50%. 


(a) Calculate the 90% confidence interval of MtM (t). 
(b) Compare the time profile of EpE (t) and E [MtM (t)]. 
) 
) 


(c) Compare the time profile of CVA (t) and its approximation. 
(d) What do you think about the numerical value of u? 


4.4.6 Risk contribution of CVA with collateral 
This exercise is based on the results of Pykhtin and Rosen (2010). 


1. We consider the portfolio w = (w1,..., Wn), which is composed of n OTC contracts. 
We assume that the mark-to-market for the contract €; is given by: 


MtM; (t) = mi (t) + 0; (t) Xi 


where X; ~ N (0,1). Determine the probability distribution of the portfolio mark-to- 
market: 


MtM (t) = X` wi - MtM; (t) 
t=1 


when (X1,...,Xn) ~N (On, p) and p = (;,;) is the correlation matrix. 
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. Calculate the correlation y; (t) between MtM; (t) and MtM (t). 


. Calculate the expected value of the counterparty exposure e* (t) = max (MtM (t) — 


C(t) ,0) when the collateral value is given by C (t) = max (MtM (t) — H,0). 


. We consider the case where there is no collateral: C (t) = 0. What is the implicit value 


of H? Deduce the expression of EpE (t; w) = E [e* (t)]. Calculate the risk contribution 
RC; of the contract €;. Show that EpE (¢; w) satisfies the Euler allocation principle. 


. We consider the case where there is a collateral: C (t) 4 0. Calculate the risk contri- 


bution RC; of the contract €;. Demonstrate that: 


YRC, = BPE (tu) - H-0 (2) 


oO 
i=1 w 


where Hw (t) and ow (t) are the expected value and volatility of MtM (t). Comment 
on this result. 


. Find the risk contribution RC; of type A Euler allocation. 
. Find the risk contribution RC; of type B Euler allocation. 


. We consider the Merton approach for modeling the default time 7 of the counterparty: 


Xi = Xp +s/1— o ni 


where Xp ~ N (0,1) and the idiosyncratic risk n; ~ N (0,1) are independent. Calcu- 
late the correlation 0w (t) between MtM (t) and Xg. Deduce the relationship between 
MtM (t) and Xp. 


. Let B (t) = 7! (1 — S (t)) be the default barrier and S (t) the survival function of the 


counterparty. How to compute the conditional counterparty exposure E [e* (t) | T = t] 
and the corresponding risk contribution RC;? Give their expressions. 


Chapter 5 


Operational Risk 


The integration of operational risk into the Basel II Accord was a long process because of 
the hostile reaction from the banking sector. At the end of the 1990s, the risk of operational 
losses was perceived as relatively minor. However, some events had shown that it was not 
the case. The most famous example was the bankruptcy of the Barings Bank in 1995. The 
loss of $1.3 bn was due to a huge position of the trader Nick Leeson in futures contracts 
without authorization. Other examples included the money laundering in Banco Ambrosiano 
Vatican Bank (1983), the rogue trading in Sumitomo Bank (1996), the headquarter fire of 
Crédit Lyonnais (1996), etc. Since the publication of the CP2 in January 2001, the position 
of banks has significantly changed and operational risk is today perceived as a major risk 
for the banking industry. Management of operational risk has been strengthened, with the 
creation of dedicated risk management units, the appointment of compliance officers and 
the launch of anti-money laundering programs. 


5.1 Definition of operational risk 


The Basel Committee defines the operational risk in the following way: 


“Operational risk is defined as the risk of loss resulting from inadequate or failed 

internal processes, people and systems or from external events. This definition 

includes legal risk, but excludes strategic and reputational risk” (BCBS, 2006, 

page 144). 
The operational risk covers then all the losses of the bank that cannot be attributed to 
market and credit risk. Nevertheless, losses that result from strategic decisions are not 
taken into account. An example is the purchase of a software or an information system, 
which is not relevant for the firm. Losses due to reputational risk are also excluded from 
the definition of operational risk. They are generally caused by an event, which is related to 
another risk. The difficulty is to measure the indirect loss of such events in terms of business. 
For instance, if we consider the diesel emissions scandal of Volkswagen, we can estimate the 
losses due to the recall of cars, class action lawsuits and potential fines. However, it is 
impossible to know what the impact of this event will be on the future sales and the market 
share of Volkswagen. 

In order to better understand the concept of operational risk, we give here the loss even 

type classification adopted by the Basel Committee: 


1. Internal fraud (“losses due to acts of a type intended to defraud, misappropriate 
property or circumvent regulations, the law or company policy, excluding diver- 
sity/discrimination events, which involves at least one internal party”) 


(a) Unauthorized activity 
(b) Theft and fraud 
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2. External fraud (“losses due to acts of a type intended to defraud, misappropriate 
property or circumvent the law, by a third party”) 
(a) Theft and fraud 
(b) Systems security 
3. Employment practices and workplace safety (“losses arising from acts inconsistent 
with employment, health or safety laws or agreements, from payment of personal 
injury claims, or from diversity/discrimination events” ) 
(a) Employee relations 
(b) Safe environment 
(c) Diversity & discrimination 
4. Clients, products & business practices (“losses arising from an unintentional or neg- 


ligent failure to meet a professional obligation to specific clients (including fiduciary 
and suitability requirements), or from the nature or design of a product”) 


(a 
(b 


) Suitability, disclosure & fiduciary 
) 
(c) Product flaws 
) 
) 


Improper business or market practices 


(d) Selection, sponsorship & exposure 


(e) Advisory activities 


5. Damage to physical assets (“losses arising from loss or damage to physical assets from 
natural disaster or other events”) 


(a) Disasters and other events 


6. Business disruption and system failures (“losses arising from disruption of business or 
system failures”) 


(a) Systems 


7. Execution, delivery & process management (“losses from failed transaction processing 
or process management, from relations with trade counterparties and vendors”) 


— 


a) Transaction capture, execution & maintenance 
b 
c 
d 


e 
f 


—~ 


Monitoring and reporting 


Customer intake and documentation 


—_ a 


Customer/client account management 


— 


Trade counterparties 


Ne OR Ne SEES = 


— 


Vendors & suppliers 


This is a long list of loss types, because the banking industry has been a fertile ground for 
operational risks. We have already cited some well-know operational losses before the crisis. 
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In 2009, the Basel Committee has published the results of a loss data collection exercise. 
For this LDCE, 119 banks submitted a total of 10.6 million internal losses with an overall 
loss amount of €59.6 bn. The largest 20 losses represented a total of €17.6 bn. In Table 
5.1, we have reported statistics of the loss data, when the loss is larger than €20 000. For 
each year, we indicate the number nz of losses, the total loss amount L and the number ng 
of reporting banks. Each bank experienced more than 300 losses larger than €20000 per 
year on average. We also notice that these losses represented about 90% of the overall loss 
amount. 


TABLE 5.1: Internal losses larger than €20 000 per year 


‘Year pre 2002 2002 2003 2004 2005 2006 2007 
nL 14017 10216 13691 22152 33216 36386 36622 
L (in € bn) 3.8 12.1 4.6 7.2 9.7 7.4 7.9 
nB 24 35 55 68 108 115 117 


Source: BCBS (2009d). 


Since 2008, operational risk has dramatically increased. For instance, rogue trading has 
impacted many banks and the magnitude of these unauthorized trading losses is much 
higher than before!. The Libor interest rate manipulation scandal led to very large fines 
($2.5 bn for Deutsche Bank, $1 bn for Rabobank, $545 mn for UBS, etc.). In May 2015, six 
banks (Bank of America, Barclays, Citigroup, J.P. Morgan, UBS and RBS) agreed to pay 
fines totaling $5.6 bn in the case of the forex scandal’. The anti-money laundering controls 
led BNP Paribas to pay a fine of $8.9 bn in June 2014 to the US federal government. In 
this context, operational risk, and more specifically compliance and legal risk, is a major 
concern for banks. It is an expansive risk, because of the direct losses, but also because of the 
indirect costs induced by the proliferation of internal controls and security infrastructure’. 


Remark 60 Operational risk is not limited to the banking sector. Other financial sectors 
have been impacted by such risk. The most famous example is the Ponzi scheme organized 
by Bernard Madoff, which caused a loss of $50 bn to his investors. 


5.2 Basel approaches for calculating the regulatory capital 


In this section, we present the three approaches described in the Basel II framework in 
order to calculate the capital charge for operational risk: 


1. the basic indicator approach (BIA); 
2. the standardized approach (TSA); 


3. and advanced measurement approaches (AMA). 


1We can cite Société Générale in 2008 ($7.2 bn), Morgan Stanley in 2008 ($9.0 bn), BPCE in 2008 ($1.1 
bn), UBS in 2011 ($2 bn) and JPMorgan Chase in 2012 ($5.8 bn). 

?The Libor scandal was a series of fraudulent actions connected to the Libor (London Interbank Offered 
Rate), while the forex scandal concerns several banks, which have manipulated exchange rates via electronic 
chatrooms in which traders discussed the trades they planned to do. 

3A typical example of expansive cost is the risk of cyber attacks. 


308 Handbook of Financial Risk Management 


We also present the Basel III framework of the standardized approach for measuring oper- 
ational risk capital with effect from January 2022. 


5.2.1 The basic indicator approach 


The basic indicator approach is the simplest method for calculating the operational risk 
capital requirement. In this case, the capital charge is a fixed percentage of annual gross 
income: o 

K=a-Gl 
where a is set equal to 15% and GI is the average of the positive gross income over the 
previous three years: 


max (GI;_1, 0) + max (GIy_2,0) + max (GI;_3, 0) 
Wher 1 {G-r > 0} 


In this approach, the capital charge is related to the financial results of the bank, but not 
to its risk exposure. 


GI = 


5.2.2 The standardized approach 


The standardized approach is an extended version of the previous method. In this case, 
the bank is divided into eight business lines, which are given in Table 5.2. The bank then 
calculates the capital charge for each business line: 


Kjt = Bj Glyt 


where 8; and GI; are a fixed percentage* and the gross income corresponding to the j‘® 
business line. The total capital charge is the three-year average of the sum of all the capital 
charges: 


3 8 
K= 1Y max XO Kjak 0 
3 k=1 j=l 


We notice that a negative capital charge in one business line may offset positive capital 
charges in other business lines. If the values of gross income are all positive, the total 
capital charge becomes: 


where GI; is the average gross income over the previous three years of the jt} business line. 


Example 51 We consider Bank A, whose activity is mainly driven by retail banking and 
asset management. We compare it with Bank B, which is more focused on corporate finance. 
We assume that the two banks are only composed of four business lines: corporate finance, 


“The values taken by the beta coefficient are reported in Table 5.2. 
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TABLE 5.2: Mapping of business lines for operational risk 


Level 1 Level 2 bj 
Corporate Finance 
Corporate Financet Tee ie ie 18% 
Advisory Services 
7 B © Sales — T 7 7 aN" 
: Market Makin 
Trading & Sales* Proprietary poate: 18% 
Treasury 
E z E -Retail Banking i 7 
Retail Banking Private Banking 12% 
Card Services 
~ Commercial Banking? Commercial Banking E 12% 
“Payment & Settlement External Clients 2000200000 18% 
o Custody ooo 7 oe 
Agency Services Corporate Agency 15% 
Corporate Trust 
Discretionary Fund Management 
eet: Management Pon DIONA Fund n % 
- Retail Brokerage Retail Brokerage Z 7 PR 


tMergers and acquisitions, underwriting, securitization, syndications, IPO, debt placements. 
Buying and selling of securities and derivatives, own position securities, lending and repos, brokerage. 


4 Project finance, real estate, export finance, trade finance, factoring, leasing, lending, guarantees, bills of 


exchange. 


retail banking, agency services and asset management. The gross income expressed in $ mn 


for the last three years is given below: 


Business line ! Bank A Bank B 
'¢—1 $2 fa SVE) t-2 t-3 

Corporate finance , 10 15 —30 , 200 300 150 

Retail banking 1 250 230 205! 50 45 —830 

Agency services 10 10 12 i 

Asset management | T0 65 72! 12 8 —4 


For Bank A, we obtain Gl—1 = 340, Gl—2 = 320 and Gl—3 = 259. The average gross 
income is then equal to 306.33, implying that the BIA capital charge BIA is equal to $45.95 
mn. If we consider Bank B, the required capital 
the case of the standardized approach, the beta coefficients are respectively equal to 18%, 


12%, 15% and 12%. We deduce that: 


1 
K TSA a 
A 3 


BIA 
Ke 


is lower and equal to $36.55 mn. In 


x (max (18% x 10 + 12% x 250 + 15% x 10 + 12% x 70,0) + 


max (18% x 15 + 12% x 230 + 15% x 10 + 12% x 65,0) + 
max (—18% x 30 + 12% x 205 + 15% x 12 + 12% x 72,0)) 


= $36.98 mn 
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We also have KESA = $42.24 mn. We notice that KR > CTSA and KB < KISA, Bank 
A has a lower capital charge when using TSA instead of BIA, because it is more exposed 
to low-risk business lines (retail banking and asset management). For Bank B, it is the 
contrary because its main exposure concerns high-risk business lines (corporate finance). 
However, if we assume that the gross income of the corporate finance for Bank B at time 
t — 3 is equal to —150 instead of +150, we obtain BIA = $46.13 mn and CESA = $34.60 
mn. In this case, the TSA approach is favorable, because the gross income at time t — 3 is 
negative implying that the capital contribution at time t — 3 is equal to zero. 


Contrary to the basic indicator approach that requires no criteria to be used, banks must 
satisfy a list of qualifying criteria for the standardized approach. For instance, the board 
of directors must be actively involved in the oversight of the operational risk management 
framework and each business line must have sufficient resources to manage operational risk. 
International active banks must also collect operational losses and use this information for 
taking appropriate action. 


5.2.3 Advanced measurement approaches 


Like the internal model-based approach for market risk, the AMA method is defined by 
certain criteria without refereing to a specific statistical model: 


e The capital charge should cover the one-year operational loss at the 99.9% confidence 
level. It corresponds to the sum of expected loss (EL) and unexpected loss (UL). 


e The model must be estimated using a minimum five-year observation period of internal 
loss data, and capture tail loss events by considering for example external loss data 
when it is needed. It must also include scenario analysis and factors reflecting internal 
control systems. 


e The risk measurement system must be sufficiently granular to capture the main oper- 
ational risk factors. By default, the operational risk of the bank must be divided into 
the 8 business lines and the 7 event types. For each cell of the matrix, the model must 
estimate the loss distribution and may use correlations to perform the aggregation. 


e The allocation of economic capital across business lines must create incentives to 
improve operational risk management. 


e The model can incorporate the risk mitigation impact of insurance, which is limited 
to 20% of the total operational risk capital charge. 


The validation of the AMA model does not only concern the measurement aspects, but also 
the soundness of the entire operational risk management system. This concerns governance 
of operational risk, dedicated resources, management structure, risk maps and key risk 
indicators (KRI), notification and action procedures, emergency and crisis management, 
business continuity and disaster recovery plans. 

In order to better understand the challenges of an internal model, we have reported 
in Table 5.3 the distribution of annualized loss amounts by business line and event type 
obtained with the 2008 loss data collection exercise. We first notice an heterogeneity be- 
tween business lines. For instance, losses were mainly concentrated in the fourth event type 
(clients, products & business practices) for the corporate finance business line (93.7%) and 
the seventh event type (execution, delivery & process management) for the payment & 
settlement business line (76.4%). On average, these two event types represented more than 
75% of the total loss amount. In contrast, fifth and sixth event types (damage to physical 
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assets, business disruption and system failures) had a small contribution close to 1%. We 
also notice that operational losses mainly affected retail banking, followed by corporate fi- 
nance and trading & sales. One of the issues is that this picture of operational risk is no 
longer valid after 2008 with the increase of losses in trading & sales, but also in payment 
& settlement. The nature of operational risk changes over time, which is a big challenge to 
build an internal model to calculate the required capital. 


TABLE 5.3: Distribution of annualized operational losses (in %) 


Event type 


Business line I 4 9 3 4 5 6 A All 

Corporate Finance 0.2 01 0.6 93.7 00 00 54, 28.0 
Trading & Sales l 11.0 0.3 23 290 02 1.8 55.3 i 13.6 
Retail Banking 6.3 19.4 98 40.4 11 1.5 21.4, 32.0 


Commercial Banking 11.4 15.2 3.1 35.5 04 1.7 32.6 7.6 
Payment & Settlement, 2.8 7.1 09 73 32 23 76.4 2.6 


Agency Services 1.0 3.2 0.7 36.0 18.2 6.0 35.0 2.6 
Asset Management „11.1 1.0 25 30.8 0.3 1.5 528, 25 
Retail Brokerage 18.1 14 63 595 01 0.2 14.4 5.1 
Unallocated | 65 28 284 283 65 13 26.2, 6.0 
All ' 61 80 60 524 14 1.2 24.9! 100.0 


Source: BCBS (2009d). 


5.2.4 Basel III approach 


From January 2022, the standardized measurement approach (SMA) will replace the 
three approaches of the Basel II framework. The SMA is based on three components: the 
business indicator (BI), the business indicator component (BIC) and the internal loss mul- 
tiplier (ILM). The business indicator is a proxy of the operational risk: 


BI = ILDC+SC+FC 


where ILDC is the interest, leases and dividends component, SC is the services component 
and FC is the financial component. The underlying idea is to list the main activities that 
generate operational risk: 


ILDC = min (INC — EXP] , 2.25% - IRE) + DIV 
SC = max (OI, OE) + max (FI, FE) 
FC = |IIrp| + |IIsp| 


where INC represents the interest income, EXP the interest expense, IRE the interest 
earning assets, DIV the dividend income, OI the other operating income, OE the other 
operating expense, FI the fee income, FE the fee expense, IIpp the net P&L of the trading 
book and IIpp the net P&L of the banking book. All these variables are calculated as the 
average over the last three years. We can draw a parallel between the business indicator 
components and the TSA components. For example, ILDC concerns corporate finance, retail 
banking, commercial banking, SC is related to payment & settlement, agency services, asset 
management, retail brokerage, while FC mainly corresponds to trading & sales. Once the 
BI is calculated and expressed in $ bn, we determine the business indicator component, 
which is given by: 


BIC = 12% - min (BI, $1 bn) + 15% - min (BI—1, $30 bn) + 18% - min (BI —30)* 
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The BIC formula recalls the basic indicator approach of Basel II, but it introduces a marginal 
weight by BI tranches. Finally, the bank has to compute the internal loss multiplier, which 


is defined as: aT 
15-L\ ` 
= 1 
nan =n (< -1+ (Fe) ) 


where L is the average annual operational risk losses over the last 10 years. ILM can be 
lower or greater than one, depending on the value of L: 


ILM < 1 & L < BIC /15 
ILM =1 & L= BIC /15 
ILM > 1 & L> BIC/15 


The capital charge for the operational risk is then equal to”: 
K = ILM- BIC 


Remark 61 The SMA of the Basel III framework may be viewed as a mix of the three 
approaches of the Basel II framework: BIA, TSA and AMA. Indeed, SMA is clearly a 
modified version of BIA by considering a basic indicator based on sources of operational 
risk. In this case, the business indicator can be related to TSA. Finally, the introduction of 
the ILM coefficient is a way to consider a more sensitive approach based on internal losses, 
which is the basic component of AMA. 


5.3 Loss distribution approach 


Although the Basel Committee does not advocate any particular method for the AMA 
method in the Basel II framework, the loss distribution approach (LDA) is the recognized 
standard model for calculating the capital charge. This model is not specific to operational 
risk because it was developed in the case of the collective risk theory at the beginning of 
1900s. However, operational risk presents some characteristics that need to be considered. 


5.3.1 Definition 


The loss distribution approach is described in Klugman ef al. (2012) and Frachot et 
al. (2001). We assume that the operational loss L of the bank is divided into a matrix of 
homogenous losses: 


K 
Lay S (5.1) 
k=1 


where Sp is the sum of losses of the kt” cell and K is the number of cells in the matrix. 
For instance, if we consider the Basel II classification, the mapping matrix contains 56 cells 
corresponding to the 8 business lines and 7 event types. The loss distribution approach is a 


5 However, the computation of the ILM coefficient is subject to some standard requirements. For instance, 
ILM is set to one for banks with a BIC lower than $1 bn and supervisors can impose the value of the ILM 
coefficient for banks that do not meet loss data collection criteria. 
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method to model the random loss S; of a particular cell. It assumes that Sp is the random 
sum of homogeneous individual losses: 


Nx(t) 
S= X XP (5.2) 
n=l 


where N; (t) is the random number of individual losses for the period [0, t] and X\ is the 
nt! individual loss. For example, if we consider internal fraud in corporate finance, we can 
write the loss for the next year as follows: 


S=X1+Xo+...+ Xn) 


where X; is the first observed loss, X2 is the second observed loss, Xy(1) is the last observed 
loss of the year and N (1) is the number of losses for the next year. We notice that we face 
two sources of uncertainty: 


1. we don’t know what will be the magnitude of each loss event(severity risk); 


2. we don’t know how many losses will occur in the next year (frequency risk). 


In order to simplify the notations, we omit the index k and rewrite the random sum as 


follows: 
N(t) 


SS] yx (5.3) 
n=1 
The loss distribution approach is based on the following assumptions: 


e the number N (t) of losses follows the loss frequency distribution P; the probability 
that the number of loss events is equal to n is denoted by p(n); 


e the sequence of individual losses X, is independent and identically distributed (iid); 
the corresponding probability distribution F is called the loss severity distribution; 


e the number of events is independent from the amount of loss events. 


Once the probability distributions P and F are chosen, we can determine the probability 
distribution of the aggregate loss S, which is denoted by G and is called the compound 
distribution. 


Example 52 We assume that the number of losses is distributed as follows: 


n | 0 1 2 3 
p(n) | 50% 30% 17% 3% 


The loss amount can take the values $100 and $200 with probabilities 70% and 30%. 


To calculate the probability distribution G of the compound loss, we first define the 
probability distribution of X1, Xı + X2 and X; + X2 + X3, because the maximum num- 
ber of losses is equal to 3. If there is only one loss, we have Pr {X, = 100} = 70% and 
Pr {X; = 200} = 30%. In the case of two losses, we obtain Pr {X, + X2 = 200} = 49%, 
Pr {X1 + X2 = 300} = 42% and Pr {X, + X2 = 400} = 9%. Finally, the sum of three losses 
takes the values 300, 400, 500 and 600 with probabilities 34.3%, 44.1%, 18.9% and 2.7% 
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respectively. We notice that these probabilities are in fact conditional to the number of 
losses. Using Bayes theorem, we obtain: 


Pr{S=s}=oPr{ 0" Xi=s| NC) =n} -Pr{N (t) =n} 


We deduce that: 
Pr{S=0} = Pr{N(t)=0} 


50% 
and: 
Pr{S=100} = Pr{X, = 100} x Pr{N (t)=1} 
= 70% x 30% 
= 21% 
The compound loss can take the value 200 in two different ways: 
Pr{S = 200} = Pr{X, = 200} x Pr{N (t)=1}+ 


Pr {X1 + X2 = 200} x Pr {N (t) = 2} 
= 30% x 30% + 49% x 17% 
17.33% 
For the other values of S, we obtain Pr{S = 300} = 8.169%, Pr{S = 400} = 2.853%, 
Pr {S = 500} = 0.567% and Pr {5 = 600} = 0.081%. 


The previous example shows that the cumulative distribution function of S can be 
written asf: 


_ f Sep) EF (s) for s>0 
r= { p(0) for s=0 (5.4) 
where F”* is the n-fold convolution of F with itself: 
F”* (s) = Pr De X: < s} (5.5) 


In Figure 5.1, we give an example of a continuous compound distribution when the annual 
number of losses follows the Poisson distribution P (50) and the individual losses follow the 
log-normal distribution LW (8, 5). The capital charge, which is also called the capital-at-risk 
(CaR), corresponds then to the percentile a: 


CaR (a) = G7! (a) (5.6) 


The regulatory capital is obtained by setting a to 99.9%: K = CaR (99.9%). This capital- 
at-risk is valid for one cell of the operational risk matrix. Another issue is to calculate 
the capital-at-risk for the bank as a whole. This requires defining the dependence function 
between the different compound losses (S1, S2,..., Sx). In summary, here are the different 
steps to implement the loss distribution approach: 


e for each cell of the operational risk matrix, we estimate the loss frequency distribution 
and the loss severity distribution; 


e we then calculate the capital-at-risk; 


e we define the dependence function between the different cells of the operational risk 
matrix, and deduce the aggregate capital-at-risk. 


6When F is a discrete probability function, it is easy to calculate F”* (s) and then deduce G (s). However, 
the determination of G(s) is more difficult in the general case of continuous probability functions. This 
issue is discussed in Section 5.3.3 on page 327. 
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FIGURE 5.1: Compound distribution when N ~ P (50) and X ~ LN (8,5) 


5.3.2 Parametric estimation 


We first consider the estimation of the severity distribution, because we will see that the 
estimation of the frequency distribution can only be done after this first step. 


5.3.2.1 Estimation of the loss severity distribution 


We assume that the bank has an internal loss database. We note {x1,..., £r} the sample 
collected for a given cell of the operational risk matrix. We consider that the individual losses 
follow a given parametric distribution F: 


X ~ F (2;9) 
where @ is the vector of parameters to estimate. 


In order to be a good candidate for modeling the loss severity, the probability distribution 
F must satisfy the following properties: the support of F is the interval R+, it is sufficiently 
flexible to accommodate a wide variety of empirical loss data and it can fit large losses. We 
list here the cumulative distribution functions that are the most used in operational risk 
models: 


e Gamma X ~ G (a, 8) 


T (a) 
where a > 0 and £ > 0. 


e Log-gamma X ~ LG (a, 8) 


where a > 0 and £ > 0. 
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Log-logistic X ~ LL (a, 8) 
F (x;0) = 


where a > 0 and 8 > 0. 


Log-normal X ~ LN (u, 0°) 


where x > 0 and ø > Q. 


Generalized extreme value X ~ GEV (1,0, £) 


rensat- pe] 


where x > u — o/Ẹ, o > 0 and € > 0. 


ræo=1- (2) 


where z > x_, a > 0 and x > 0. 


Pareto X ~ P (a, x—) 


The vector of parameters 0 can be estimated by the method of maximum likelihood 
(ML) or the generalized method of moments (GMM). In Chapter 10, we show that the 
log-likelihood function associated to the sample is: 


ae 
£(0) = dns (2430) (5.7) 


where f (x; 0) is the density function. In the case of the GMM, the empirical moments are: 


{ hii = i-E[X] | 
hi2 (0) = (a; — E [X]}? — var (X) 


(5.8) 


In Table 5.4, we report the density function f (x; 0), the mean E [X] and the variance var (X) 
when X follows one of the probability distributions described previously. For instance, if we 
consider that X ~ LN (1,07), it follows that the log-likelihood function is: 


T T 2 
T T 1 Ina; — 
£(0)=- 5 Ina; 5 no 5 22T E > = £) 
i=1 


i=1 


whereas the empirical moments are: 
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TABLE 5.4: Density function, mean and variance of parametric probability distribution 


Distribution f (x; 6) [X] var (X) 


Bex le Bt 


a 
ee re) A a 
Be (Ina)°7} Ne. B Ne ge N”, 
LG (a, B) eo (<4) ifB>1 (5) -(55) if B>2 
B (e/a) or sf oe a 
ens altta) Bsn 7t S Ca wate) 
if 8 >2 
2 
LN (u,0?) >| (=) exp (u+ 50?) exp (2u +0?) (exp (o?) - 1) 
= —(+1/8) o a2 
= |r+¢(—*)| e i-a- = Oe Sa) 
GEV (u,0,€) 7 s ve £ £ 
sof- pre (5 “Y e<: Hess 
a 2 
P (a,x — — ifa>1 =e 5 Ha>2 


In the case of the log-normal distribution, the vector 0 is composed of two parameters p 
and o, implying that two moments are sufficient to define the GMM estimator. This is 
also the case of other probability distributions, except the GEV distribution that requires 
specification of three empirical moments’. 


Example 53 We assume that the individual losses take the following values expressed in 
thousand dollars: 10.1, 12.5, 14, 25, 317.3, 353, 1200, 1254, 52000 and 251 000. 


Using the method of maximum likelihood, we find that âm, and Bua are equal to 15.70 and 
1.22 for the log-gamma distribution and 293721 and 0.51 for the log-logistic distribution. 
In the case of the log-normal distribution®, we obtain Âm; = 12.89 and ôm = 3.35. 


The previous analysis assumes that the sample of operational losses for estimating 0 
represents a comprehensive and homogenous information of the underlying probability dis- 
tribution F. In practice, loss data are plagued by various sources of bias. The first issue lies 
in the data generating process which underlies the way data have been collected. In almost 
all cases, loss data have gone through a truncation process by which data are recorded only 
when their amounts are higher than some thresholds. In practice, banks’ internal thresholds 
are set in order to balance two conflicting wishes: collecting as many data as possible while 
reducing costs by collecting only significant losses. These thresholds, which are defined by 
the global risk management policy of the bank, must satisfy some criteria: 


7We can use the moment of order 3, which corresponds to: 


g3 


E3 


E |(X -E[XD?] = (T0 -= 38) — 3r (1 — 28) r (1 — €) + 2T° (1 — £)) 


81f we consider the generalized method of moments, the estimates are famm = 16.26 and ĉôc&mm = 1.40. 
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“A bank must have an appropriate de minimis gross loss threshold for internal 
loss data collection, for example €10000. The appropriate threshold may vary 
somewhat between banks, and within a bank across business lines and/or event 
types. However, particular thresholds should be broadly consistent with those 
used by peer banks” (BCBS, 2006, page 153). 


The second issue concerns the use of relevant external data, especially when there is reason to 
believe that the bank is exposed to infrequent, yet potentially severe losses. Typical examples 
are rogue trading or cyber attacks. If the bank has not yet experienced a large amount of 
loss due to these events in the past, this does not mean that it will never experience such 
problems in the future. Therefore, internal loss data must be supplemented by external data 
from public and/or pooled industry databases. Unfortunately, incorporating external data 
is rather dangerous and requires careful methodology to avoid the pitfalls regarding data 
heterogeneity, scaling problems and lack of comparability between too heterogeneous data. 
Unfortunately, there is no satisfactory solution to deal with these scaling issues, implying 
that banks use external data by taking into account only reporting biases and a fixed and 
known threshold’. 


The previous issues imply that operational risk loss data cannot be reduced to the sam- 
ple of individual losses, but also requires specifying the threshold H; for each individual 
loss x;. The form of operational loss data is then {(x;,H;),i=1,...,T}, where z; is the 
observed value of X knowing that X is larger than the threshold H;. Reporting thresholds 
affect severity estimation in the sense that the sample severity distribution (i.e. the severity 
distribution of reported losses) is different from the ‘true’ one (i.e. the severity distribu- 
tion one would obtain if all the losses were reported). Unfortunately, the true distribution 
is the most relevant for calculating capital charge. As a consequence, linking the sample 
distribution to the true one is a necessary task. From a mathematical point of view, the 
true distribution is the probability distribution of X whereas the sample distribution is the 
probability distribution of X | X > H;. We deduce that the sample distribution for a given 
threshold H is the conditional probability distribution defined as follows: 


F* (x;0| H) = Pr{X <a2|X >H} 
Pr{X <2,X > H} 
~ Pr{X > H} 
Pr{X < z}—Pr{X < min (a, H)} 
~ Pr{X > H} 
= (o> Hy SS (5.9) 


It follows that the density function is: 


ft (2:0 | H) = 1 {a> Hy A 


To estimate the vector of parameters 0, we continue to use the method of maximum like- 
lihood or the generalized method of moments by considering the correction due to the 


°See Baud et al. (2002, 2003) for more advanced techniques based on unknown and stochastic thresholds. 
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truncation of data. For the ML estimator, we have then: 


Dus (x;;0 | Hi) 


Sateh + Dna (a> Hi) Dm (1 — F (H;;0)) 


£(6) 


(5.10) 


where H; is the threshold associated to the it observation. The correction term 
-54 ln (1 — F (H;;0)) shows that maximizing a conventional log-likelihood function 
which ignores data truncation is totally misleading. We also notice that this term van- 
ishes when H; is equal to zero!°. For the GMM estimator, the empirical moments become: 


hia (0) = z: —E[X | X > H]] 
hi2 (0) = (a —E[X | X > Hi)? — var (X | X > H;) 
There is no reason that the conditional moment E [X™ | X > Hj] is equal to the uncondi- 


tional moment E [|X™]. Therefore, the conventional GMM estimator is biased and this is 
why we have to apply again the threshold correction. 


(5.11) 


If we consider again the log-normal distribution, the expression of the log-likelihood 
function (5.10) ist 


T T 2 
T T 2 1 Ina; — yu 
£(0) = 3 ln 27 5 Ino > ln zi ) ( z ) 


Dh (1-a (==) 


Let us now calculate the conditional moment p/, (X) = E[X™ | X > H]. By using the 
notation ®, (x) = 1 — ® ((x — u) /o), we have: 


' (X) 1 [ el 1 = q 
= ex x 
Hm e (ln H) Jy woV2r p 2 o 


1 i: 1 e 1 e d 
= —— x | m 
e (In H) Jm g oV20 j o A i 


2 
_exp San ao? /2) a 1 = 1 fy—(u+mo?) ji 
®, (In H) In H OV 2T 


P. (In H — mo?) 
®. (ln H) 


exp (mu +m fa /2) 


We deduce that: 


1-6 (=e?) 


1 p ® (===) 


10Indeed, we have F (0; 6) = 0 and In (1 — F (0;6)) = 0. 
1l By construction, the observed value z; is larger than the threshold H;, meaning that In 1 {x; > Hi} is 
equal to 0. 


elt ae” 


[X | X > H] =a(0,H) = 
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and: 


1-6 (2e) 


1- (=) 


2 [X? | X > H| =b (0, H) = e2u+20? 


We finally obtain: 


hia (8) = Ti — a (0, H;) 
hia (8) = ge — 2ra (0, H;) + 2a? (8, H;) —b (6, H;) 


In order to illustrate the impact of the truncation, we report in Figure 5.2 the cumulative 
distribution function and the probability density function of X | X > H when X follows 
the log-normal distribution LW (8,5). The threshold H is set at $10000, meaning that the 
bank collects operational losses when the amount is larger than this threshold. In the bottom 
panels of the figure, we indicate the mean and the variance with respect to the threshold 
H. We notice that data truncation increases the magnitude of the mean and the variance. 
For instance, when H is set at $10000, the conditional mean and variance are multiplied 
by a factor equal to 3.25 with respect to the unconditional mean and variance. 


Cumulative distribution function 


Probability density function 


eee With truncation 


ry 
i 
1 
‘ 
` = Without truncation 
\ 
` 
iy 


Variance 


5 0 1 2 3 4 5 
H x 104 H x 104 


FIGURE 5.2: Impact of the threshold H on the severity distribution 


Example 54 We consider Example 53 and assume that the losses have been collected using 
a unique threshold that is equal to $5 000. 


By using the truncation correction, the ML estimates become ĝm = 8.00 and ôML = 
5.71 for the log-normal model. In Figure 5.3, we compare the log-normal cumulative distri- 


bution function without and with the truncation correction. We notice that the results are 
very different. 


The previous example shows that estimating the parameters of the probability dis- 
tribution is not sufficient to define the severity distribution. Indeed, ML and GMM give 
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FIGURE 5.3: Comparison of the estimated severity distributions 


two different log-normal probability distributions. The issue is to decide which is the best 
parametrization. In a similar way, the choice between the several probability families (log- 
normal, log-gamma, GEV, Pareto, etc.) is an open question. This is why fitting the severity 
distribution does not reduce to estimate the parameters of a given probability distribu- 
tion. It must be completed by a second step that consists in selecting the best estimated 
probability distribution. However, traditional goodness-of-fit tests (Kolmogorov-Smirnov, 
Anderson-Darling, etc.) are not useful, because they concern the entire probability distri- 
bution. In operational risk, extreme events are more relevant. This explains why QQ plots 
and order statistics are generally used to assess the fitting of the upper tail. A QQ plot 
represents the quantiles of the empirical distribution against those of the theoretical model. 
If the statistical model describes perfectly the data, we obtain the diagonal line y = a. 
In Figure 5.4, we show an example of QQ plot. We notice that the theoretical quantiles 
obtained from the statistical model are in line with those calculated with the empirical data 
when the quantile is lower than 80%. Otherwise, the theoretical quantiles are above the em- 
pirical quantiles, meaning that extreme events are underestimated by the statistical model. 
We deduce that the body of the distribution is well estimated, but not the upper tail of 
the distribution. However, medium losses are less important than high losses in operational 
risk. 


5.3.2.2 Estimation of the loss frequency distribution 


In order to model the frequency distribution, we have to specify the counting process 
N (t), which defines the number of losses occurring during the time period (0, t]. The number 
of losses for the time period [f;, t2] is then equal to: 


N (ti; t2) = N (t2) = N (ti) 
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FIGURE 5.4: An example of QQ plot where extreme events are underestimated 


We generally made the following statements about the stochastic process N (t): 


e the distribution of the number of losses N (t;t + h) for each h > 0 is independent of 
t; moreover, N (t;t + h) is stationary and depends only on the time interval h; 


e the random variables N (t1;t2) and N (t3;t4) are independent if the time intervals 
[t1, t2] and [t3, t4] are disjoint; 


e no more than one loss may occur at time t. 
These simple assumptions define a Poisson process, which satisfies the following properties: 


1. there exists a scalar À > 0 such that the distribution of N (t) has a Poisson distribution 
with parameter At; 


2. the duration between two successive losses is tid and follows the exponential distribu- 


tion £ (A). 
Let p(n) be the probability to have n losses. We deduce that: 
p(n) = Pr{N(t) =n} 
-At (\p)" 
PE Q ) (5.12) 
n! 


Without loss of generality, we can fix t = 1 because it corresponds to the required one- 
year time period for calculating the capital charge. In this case, N (1) is simply a Poisson 
distribution with parameter A. This probability distribution has a useful property for time 
aggregation. Indeed, the sum of two independent Poisson variables N; and Nə with param- 
eters A; and Az is also a Poisson variable with parameter A; + Ag. This property is a direct 
result of the definition of the Poisson process. In particular, we have: 


sow (1 EY ova) 


k=1 
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where N ((k—1)/K;k/K) ~ P(A/K). This means that we can estimate the frequency 
distribution at a quarterly or monthly period and convert it to an annual period by simply 
multiplying the quarterly or monthly intensity parameter by 4 or 12. 

The estimation of the annual intensity À can be done using the method of maximum 
likelihood. In this case, is the mean of the annual number of losses: 


|= 


i= 


z 


Yy 


5N, (5.13) 
y=1 


where N, is the number of losses occurring at year y and ny is the number of observations. 
One of the key features of the Poisson distribution is that the variance equals the mean: 


A = E[N (1)] = var (N (1)) (5.14) 


We can use this property to estimate by the method of moments. If we consider the first 
moment, we obtain the ML estimator, whereas we have with the second moment: 


where JN is the average number of losses. 


Example 55 We assume that the annual number of losses from 2006 to 2015 is the fol- 
lowing: 57, 62, 45, 24, 82, 36, 98, 75, 76 and 45. 


The mean is equal to 60 whereas the variance is equal to 474.40. In Figure 5.5, we show 
the probability mass function of the Poisson distribution with parameter 60. We notice that 
the parameter A is not enough large to reproduce the variance and the range of the sample. 
However, using the moment estimator based on the variance is completely unrealistic. 


When the variance exceeds the mean, we use the negative binomial distribution 
NB (r, p), which is defined as follows: 


p(n) = G3 ‘) (1—p)"p” 


where r > 0 and p € [0,1]. The negative binomial distribution can be viewed as the proba- 
bility distribution of the number of successes in a sequence of iid Bernoulli random variables 
B (p) until we get r failures. The negative binomial distribution is then a generalization of 
the geometric distribution. Concerning the first two moments, we have: 


and: per 
var (NB (p) = 72 
We verify that: 
1 
var (NB (r,p)) = = VB (r, p)] 
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FIGURE 5.5: PMF of the Poisson distribution P (60) 
Remark 62 The negative binomial distribution corresponds to a Poisson process where the 
intensity parameter is random and follows a gamma distribution! : 
NB(r,p)~ P(A) and Aw~G(a,8) 
where a =r and 8 = (1 — p) /p. 


We consider again Example 55 and assume that the number of losses is described by 
the negative binomial distribution. Using the method of moments, we obtain the following 
estimates: 


p m? 60? 

meam deo 
ag 474.40 — 60 

fe. eS 2 pet 


v 474.40 
where m is the mean and v is the variance of the sample. Using these estimates as the 
starting values of the numerical optimization procedure, the ML estimates are f = 7.7788 
and p = 0.8852. We report the corresponding probability mass function p (n) in Figure 5.6. 
We notice that this distribution better describes the sample that the Poisson distribution, 
because it has a larger support. In fact, we show in Figure 5.7 the probability density 
function of A for the two estimated counting processes. For the Poisson distribution, À is 
constant and equal to 60, whereas À has a gamma distribution G (7.7788, 0.1296) in the case 
of the negative binomial distribution. The variance of the gamma distribution explains the 
larger variance of the negative binomial distribution with respect to the Poisson distribution, 
while we notice that the two distributions have the same mean. 


See Exercise 5.4.6 on page 346. 
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FIGURE 5.6: PMF of the negative binomial distribution 
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FIGURE 5.7: Probability density function of the parameter A 
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As in the case of the severity distribution, data truncation and reporting bias have an 
impact of the frequency distribution (Frachot et al., 2006). For instance, if one bank’s re- 
porting threshold H is set at a high level, then the average number of reported losses will 
be low. It does not imply that the bank is allowed to have a lower capital charge than an- 
other bank that uses a lower threshold and is otherwise identical to the first one. It simply 
means that the average number of losses must be corrected for reporting bias as well. It ap- 
pears that the calibration of the frequency distribution comes as a second step (after having 
calibrated the severity distribution) because the aforementioned correction needs an estimate 
of the exceedance probability Pr {X > H} for its calculation. This is rather straightforward: 
the difference (more precisely the ratio) between the number of reported events and the ‘true’ 
number of events (which would be obtained if all the losses were reported, i.e. with a zero- 
threshold) corresponds exactly to the probability of one loss being higher than the threshold. 
This probability is a direct by-product of the severity distribution. 

Let Ny (t) be the number of events that are larger than the threshold H. By definition, 
Ny (t) is the counting process of exceedance events: 


N(t) 
Ny (t) = X 1{X; > H} 
i=1 
It follows that: 
N(t) 
Nr O] = E|})1{X,> H} 
i=1 
= E [Er > HYN (t) =n! 
i=1 
= E[N(t)]-E[1{xX; > HY 
because the random variables X1,..., Xn are iid and independent from the random number 
of events N (t). We deduce that: 
[Nn ()] = E[N G) Pr{X; > H} 


= E[N(t)]-(1-F(H;6)) (5.15) 
This latter equation provides information about the transformation of the counting process 
N (t) into the exceedance process. However, it only concerns the mean and not the dis- 
tribution itself. One interesting feature of data truncation is when the distribution of the 
threshold exceedance process belongs to the same distribution class of the counting process. 
It is the case of the Poisson distribution: 


Py (A) =P (a) 


Using Equation (5.15), it follows that the Poisson parameter Àp of the exceedance process is 
simply the product of the Poisson parameter À by the exceedance probability Pr {X > H}: 


An = à- (1 — F (H;0)) 


We deduce that the estimator \ has the following expression: 


where Na is the average number of losses that are collected above the threshold H and 
F (x; ô) is the parametric estimate of the severity distribution. 
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Example 56 We consider that the bank has collected the loss data from 2006 to 2015 with a 
threshold of $20000. For a given event type, the calibrated severity distribution corresponds 
to a log-normal distribution with parameters fi = 7.3 and ô = 2.1, whereas the annual 


number of losses is the following: 23, 13, 50, 12, 25, 36, 48, 27, 18 and 35. 


Using the Poisson distribution, we obtain Âg = 28.70. The probability that the loss exceeds 
the threshold H is equal to: 


In (20000) — 7.3 
2.1 


Pr {X > 20000} =1-® ( ) = 10.75% 


This means that only 10.75% of losses can be observed when we apply a threshold of $20 000. 
We then deduce that the estimate of the Poisson parameter is equal to: 


28.70 
10.75% 


On average, there are in fact about 270 loss events per year. 


A= = 266.90 


We could discuss whether the previous result remains valid in the case of the negative 
binomial distribution. If it is the case, then we have: 


Py (r, p) =P (ru, pu) 


Using Equation (5.15), we deduce that: 


PH'TH pr 
= -(1-F(4;6 
reg ee 


If we assume that ry is equal to r, we obtain: 


_ p: (1-F(H;0)) 
PH “1 —p-F(H;0) 


We verify the following inequality p < py < 1. However, this solution is not completely 
satisfactory. 


5.3.3 Calculating the capital charge 


Once the frequency and severity distributions are calibrated, the computation of the 
capital charge is straightforward. For that, we can use the Monte Carlo method or different 
analytical methods. The Monte Carlo method is much more used, because it is more flexible 
and gives better results in the case of low frequency/high severity events. Analytical ap- 
proaches, which are very popular in insurance, can be used for high frequency/low severity 
events. One remaining challenge, however, is aggregating the capital charge of the different 
cells of the mapping matrix. By construction, the loss distribution approach assumes that 
aggregate losses are independent. Nevertheless, regulation are forcing banks to take into 
account positive correlation between risk events. The solution is then to consider copula 
functions. 


5.3.3.1 Monte Carlo approach 


We reiterate that the one-year compound loss of a given cell is defined as follows: 


N(1) 
S= 5 x 
qt 
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where X; ~ F and N (1) ~ P. The capital-at-risk is then the 99% quantile of the compound 
loss distribution. To estimate the capital charge by Monte Carlo, we first simulate the annual 
number of losses from the frequency distribution and then simulate individual losses in order 
to calculate the compound loss. Finally, the quantile is estimated by order statistics. The 
algorithm is described below. 


Algorithm 1 Compute the capital-at-risk for an operational risk cell 


Initialize the number of simulations ng 
for j = 1 : ns do 
Simulate an annual number n of losses from the frequency distribution P 
S; +0 
for i = 1 : n do 
Simulate a loss X; from the severity distribution F 
Sj = Sj + Xi 
end for 
end for 
Calculate the order statistics Syng,.--, Snsins 
Deduce the capital-at-risk CaR = Sang:ng with a = 99.9% 
return CaR 


Let us illustrate this algorithm when N (1) ~ P (4) and X; ~ LN (8,4). Using a linear 
congruential method, the simulated values of N (1) are 3, 4, 1, 2, 3, etc. while the simulated 
values of X; are 3388.6, 259.8, 13328.3, 39.7, 1220.8, 1486.4, 15197.1, 3205.3, 5070.4, 84704.1, 
64.9, 1237.5, 187073.6, 4757.8, 50.3, 2805.7, etc. For the first simulation, we have three losses 
and we obtain: 

S = 3388.6 + 259.8 + 13328.3 = $16 976.7 


For the second simulation, the number of losses is equal to four and the compound loss is 
equal to: 
S2 = 39.7 + 1220.8 + 1486.4 + 15197.1 = $17 944.0 


For the third simulation, we obtain S3 = $3 205.3, and so on. Using ng simulations, the 
value of the capital charge is estimated with the 99.9% empirical quantile based on order 
statistics. For instance, Figure 5.8 shows the histogram of 2000 simulated values of the 
capital-at-risk estimated with one million simulations. The true value is equal to $3.24 mn. 
However, we notice that the variance of the estimator is large. Indeed, the range of the MC 
estimator is between $3.10 mn and $3.40 mn in our experiments with one million simulation 
runs. 

The estimation of the capital-at-risk with a high accuracy is therefore difficult. The 
convergence of the Monte Carlo algorithm is low and the estimated quantile can be very far 
from the true quantile especially when the severity loss distribution is heavy tailed and the 
confidence level a is high. That’s why it is important to control the accuracy of G7! (a). 
This can be done by verifying that the estimated moments are close to the theoretical ones. 
For the first two central moments, we have: 


2 [S] = E [N (1)] -E[Xi] 


and: 


var (S) = E [N (1)] - var (X;) + var (N (1)) - E? [X:] 


To illustrate the convergence problem, we consider the example of the compound Poisson 
distribution where N (1) ~ P (10) and X; ~ LN (5,07). We compute the aggregate loss 
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FIGURE 5.8: Histogram of the MC estimator CaR 


distribution by the Monte Carlo method for different number ng of simulations and different 
runs. To measure the accuracy, we calculate the ratio between the MC standard deviation 
Gn, (S) and the true value ø (S): 


Ons (S) 


UT 


We notice that the convergence is much more erratic when o takes a high value (Figure 
5.10) than when ø is low (Figure 5.9). When o takes the value 1, the convergence of the 
Monte Carlo method is verified with 100000 simulations. When o takes the value 2.5, 100 
million simulations are not sufficient to estimate the second moment, and then the capital- 
at-risk. Indeed, the occurrence probability of extreme events is generally underestimated. 
Sometimes, a severe loss is simulated implying a jump in the empirical standard deviation 
(see Figure 5.10). This is why we need a large number of simulations in order to be confident 
when estimating the 99.9% capital-at-risk with high severity distributions. 


Remark 63 With the Monte Carlo approach, we can easily integrate mitigation factors 
such as insurance coverage. An insurance contract is generally defined by a deductive!’ A 
and the maximum amount B of a loss, which is covered by the insurer. The effective loss 
X; suffered by the bank is then the difference between the loss of the event and the amount 
paid by the insurer: 

Xi = X; — max (min (X;, B) — A, 0) 


The relationship between X; and X; is shown in Figure 5.11. In this case, the annual loss 
of the bank becomes: 


13It corresponds to the loss amount the bank has to cover by itself. 
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FIGURE 5.9: Convergence of the accuracy ratio R(n.) when o = 1 
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FIGURE 5.10: Convergence of the accuracy ratio R (ns) when o = 2.5 
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Taking into account an insurance contract is therefore equivalent to replace X; by X; in the 
Monte Carlo simulations. 


A+a-—Bp>-------------------------------5 


Rf------------------ 


l X; 
A B 


FIGURE 5.11: Impact of the insurance contract on the operational risk loss 


5.3.3.2 Analytical approaches 


There are three analytical (or semi-analytical) methods to compute the aggregate loss 
distribution: the solution based on characteristic functions, Panjer recursion and the single 
loss approximation. 


Method of characteristic functions Formally, the characteristic function of the ran- 
dom variable X is defined by: 


px (t) -F [et] 
If X has a continuous probability distribution F, we obtain: 


px (t) = n e"? dF (x) 


We notice that the characteristic function of the sum of n independent random variables is 
the product of their characteristic functions: 


PX14..4X, (t) = E 


i=l 
= II PX: (t) 
i=l 


It comes that the characteristic function of the compound distribution G is given by: 


vs (t) = X p(n) (px (#)” = pua) (ex H) 


n=0 
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where yw1) (t) is the probability generating function of N (1). For example, if N (1) ~ P (A), 
we have: 

ena) (t) = ACY 
and: 

ps (t) = rex -1) 
We finally deduce that S has the probability density function given by the Laplace transform 
of ys (t): 


TAA er 

gle) =z | os) at 
a ES 

Using this expression, we can easily compute the cumulative distribution function and its 

inverse with the fast fourier transform. 


Panjer recursive approach Panjer (1981) introduces recursive approaches to compute 
high-order convolutions. He showed that if the probability mass function of the counting 
process N (t) satisfies: 


pin) = (a+ 2) p(n) 


where a and b are two scalars, then the following recursion holds: 
i y 
g(a) =P fle) | (a+0%) FOIE- dy 
0 


where x > 0. For discrete severity distributions satisfying fn = Pr {X; = nd} where ô is the 
monetary unit (e.g. $10000), the Panjer recursion becomes: 


Gn = Pr{S = nô} 
B 1 Š bi) , 
T P («+ 4) fj 9n-j 


where: 


g = > p(n) (fo)” 
n=0 


p (0) ef ifa=0 
p(0)(1— afo) 4 otherwise 


The capital-at-risk is then equal to: 
CaR (a) = n*6 
where: 


n 
n* = inf n:X gza 
j=0 


Like the method of characteristic functions, the Panjer recursion is very popular among 
academics, but produces significant numerical errors in practice when applied to operational 
risk losses. The issue is the support of the compound distribution, whose range can be from 
zero to several billions. 
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Example 57 We consider the compound Poisson distribution with log-normal losses and 
different sets of parameters: 


(a) \X=5, w=5, o = 1.0; 
(b) X=5, w=5, o = 1.5; 
(c) X=5, w=5, o = 2.0; 
(d) `= 50, p=5,0 = 2.0. 


In order to implement the Panjer recursion, we have to perform a discretization of the 
severity distribution. Using the central difference approximations, we have: 


Tri Pr{nb— 3 <xXi<nd+$} 


(ei)s) 


To initialize the algorithm, we use the convention fo = F (6/2). In Figure 5.12, we compare 
the cumulative distribution function of the aggregate loss obtained with the Panjer recursion 
and Monte Carlo simulations!'*. We deduce the capital-at-risk for different values of a in 
Table 5.5. In our case, the Panjer algorithm gives a good approximation, because the support 
of the distribution is ‘bounded ’. When the aggregate loss can take very large values, we 
need a lot of iterations to achieve the convergence!®. Moreover, we may have underflow in 
computations because go œ 0. 


TABLE 5.5: Comparison of the capital-at-risk calculated with Panjer recursion and Monte 
Carlo simulations 


Panjer recursion l Monte Carlo simulations 
(a) (b) (c) (dq)! (a) (b) (c) (d) 
90% 2400 4500 11000 91000 , 2350 4908 11648 93677 
95% 2900 6500 19000 120000 ı 2896 6913 19063 123569 
99% 4300 13500 52000 231000 4274 13711 51908 233567 
99.5% ı 4900 18000 77000 308000 ı 4958 17844 77754 310172 
99.9% | 6800 32500 182000 604000 ' 6773 32574 185950 604756 


a 


Single loss approximation If the severity belongs to the family of subexponential dis- 
tributions, then Böcker and Kliippelberg (2005) and Böcker and Sprittulla (2006) show 
that the percentile of the compound distribution can be approximated by the following 
expression: 


G! (a) = (E[N (1)] —1)-E[XJ+F (1 = wu) (5.16) 


It follows that the capital-at-risk is the sum of the expected loss and the unexpected loss 
defined as follows: 


EL = E[N(1)]-E[Xj] 
UL(a) = F! (1- ib) - E[X:;] 


14We use one million simulations. 
15Tn this case, it is not obvious that the Panjer recursion is faster than Monte Carlo simulations. 
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FIGURE 5.12: Comparison between the Panjer and MC compound distributions 


To understand Formula (5.16), we recall that subexponential distributions are a special case 
of heavy-tailed distributions and satisfy the following property: 


, Pr{Xi +: +Xn > ax} 
lim =1 
z> Pr {max (X1,..., Xn) > £} 


This means that large values of the aggregate loss are dominated by the maximum loss of 
one event. If we decompose the capital-at-risk as a sum of risk contributions, we obtain: 


[N(1)] 
Gla= So RCG 
i=1 
where: 
RC; =E[X]  fori#i* 
and: i 
-pi [ļ1 -2 
RC. =F (1 ia 


In this model, the capital-at-risk is mainly explained by the single largest loss i*. If we 
neglect the small losses, the capital-at-risk at the confidence level acar is related to the 
quantile severity of the loss severity: 


1 — acaR 
QSeverity = 1— NA 


This relationship’® is shown in Figure 5.13 and explains why this framework is called the 
single loss approximation (SLA). For instance, if the annual number of losses is equal to 
100 on average, computing the capital-at-risk with a 99.9% confidence level is equivalent to 


estimate the quantile 99.999% of the loss severity. 
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FIGURE 5.13: Relationship between acar and Asgeverity 
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FIGURE 5.14: Numerical illustration of the single loss approximation 
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The most popular subexponential distributions used in operational risk modeling are the 
log-gamma, log-logistic, log-normal and Pareto probability distributions (BCBS, 2014f). For 
instance, if N (1) ~ P (A) and X; ~ LN (p,07), we obtain: 


1 
EL = exp (u + 57°) 


ce) - a0 (1-52) eo) 


In Figure 5.14, we report the results of some experiments for different values of parameters. 
In the top panels, we assume that A = 100, w = 5.0 and o = 2.0 (left panel), and A = 
500, u = 10.0 and o = 2.5 (right panel). These two examples correspond to medium 
severity/low frequency and high severity/low frequency events. In these cases, we obtain 
a good approximation. In the bottom panel, the parameters are A = 1000, u = 8.0 and 
ø = 1.0. The approximation does not work very well, because we have a low severity /high 
frequency events and the risk can then not be explained by an extreme single loss. The 
underestimation of the capital-at-risk is due to the underestimation of the number of losses. 
In fact, with low severity/high frequency events, the risk is not to face a large single loss, 
but to have a high number of losses in the year. This is why it is better to approximate the 
capital-at-risk with the following formula: 


and: 


G7! (a) ~ (P~! (a) — 1) E[XJ+F7! (1 -— way) 


where P is the cumulative distribution function of the counting process N (1). In Figure 
5.14, we have also reported this approximation SLA* for the third example. We verify that 
it gives better results for high frequency events than the classic approximation. 


5.3.3.3 Aggregation issues 
We recall that the loss at the bank level is equal to: 


K 
L= 0S, 

k=1 
where S; is the aggregate loss of the kt cell of the mapping matrix. For in- 
stance, if the matrix is composed of the eight business lines (BL) and seven even 
types (ET) of the Basel II classification, we have L = J pegSk where K = 
{(BLx,, ET.) , kı = 1,...,8; k2 =1,...,7}. Let CaRx,,4. (a) be the capital charge calcu- 
lated for the business line kı and the event type kz. We have: 


CaRk, kz (a) > Gin ks (a) 


One solution to calculate the capital charge at the bank level is to sum up all the capital 
charges: 


8 7 
CaR (a) = XO So CaRm ro (a) 
ky=1 ko=1 
8 


7 
= >, 2 Gik (a) 


kı=1 k2=1 


16In Chapter 12, we will see that such transformation is common in extreme value theory. 
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From a theoretical point of view, this is equivalent to assume that all the aggregate losses 
Spk are perfectly correlated. This approach is highly conservative and ignores diversification 
effects between business lines and event types. 


Let us consider the two-dimensional case: 


L = §,4+S8» 


Í 
M 
2s 
+ 
Y 
a 


In order to take into account the dependence between the two aggregate losses Sı and 
S2, we can assume that frequencies Ni and Na are correlated or severities X; and Y; are 
correlated. Thus, the aggregate loss correlation p (S1, S2) depends on two key parameters: 


e the frequency correlation p (N1, N2); 
e the severity correlation p (X;, Y;). 


For example, we should observe that historically, the number of external fraud events is 
high (respectively low) when the number of internal fraud events is also high (respectively 
low). Severity correlation is more difficult to justify. In effect, a basic feature of the LDA 
model requires assuming that individual losses are jointly independent. Therefore it is con- 
ceptually difficult to assume simultaneously severity independence within each class of risk 
and severity correlation between two classes. By assuming that p (X;, Yj) = 0, Frachot et 
al. (2004) find an upper bound of the aggregate loss correlation. We have: 


COV (S1, S2) = z [S1 S2] — z [S1] - z [S2] 


= | ee te ¥,| - |e x [ory 


= E[N,N]-E[X{]-E[Y;] - E [N1] ; E [X]; E [Nə] ; E [Y;] 
= (E[N N?] — E [N1]; E [N2]) ; E [X:;] ; E [Y; 


and: 


(E[Ni No] — E [N:] ; E [N2]) - E [X;] -E [Y;] 
„var (S1) - var (S2) 


If we assume that the counting processes N1 and Nə are Poisson processes with parameters 
Az, and Az, we obtain: 


p (S1, S2) = 


p (S1, S2) = p (N1, No) -n (Xi) -n 53) 


where: 


n(X) = 


1+ CV?’ (X) 


Here CV (X) = o (X) /E [X] denotes the coefficient of variation of the random variable X. 
As a result, aggregate loss correlation is always lower than frequency correlation: 


0 < p (S1, S2) < p (N1, No) <1 


We deduce that an upper bound of the aggregate loss correlation is equal to: 
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For high severity events, severity independence likely dominates frequency correlation and 
we obtain pt ~ 0 because 7 (X;) œ 0. 
Let us consider the example of log-normal severity distributions. We have: 


1 1 
pt =exp (-50% = Ze?) 
We notice that this function is decreasing with respect to ox and oy. Figure 5.15 shows 
the relationship between ox, oy and pt. We verify that p* is small when ox and oy take 


large values. For instance, if ox = ay = 2, the aggregate loss correlation is lower than 2%. 


100 > 


pt (in 2) 


FIGURE 5.15: Upper bound pt of the aggregate loss correlation 


There are two ways to take into account correlations for computing the capital charge 
of the bank. The first approach is to consider the normal approximation: 


CaR (a) = SO EL, +, /X_ Prw: (CaRy (a) — ELp) - (CaRy (a) — ELp) 
k k,k! 


where px,x is the correlation between the cells k and k’ of the mapping matrix. The second 
approach consists in introducing the dependence between the aggregate losses using a copula 
function C. The joint distribution of (S1,..., Sx) has the following form: 


Pr {4 < s1,..., Sg < sK} =C (Gi (s1),...,GK (sK)) 


where Gę is the cumulative distribution function of the k* aggregate loss Sp. In this 
case, the quantile of the random variable L = 3 Sk is estimated using Monte Carlo 
simulations. The difficulty comes from the fact that the distributions G; have no analytical 
expression. The solution is then to use the method of empirical distributions, which is 
presented on page 806. 
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5.3.4 Incorporating scenario analysis 


The concept of scenario analysis should deserve further clarification. Roughly speaking, 
when we refer to scenario analysis, we want to express the idea that banks’ experts and 
experienced managers have some reliable intuitions on the riskiness of their business and 
that these intuitions are not entirely reflected in the bank’s historical internal data. As a first 
requirement, we expect that experts should have the opportunity to give their approval to 
capital charge results. In a second step, one can imagine that experts’ intuitions are directly 
plugged into severity and frequency estimations. Experts’ intuition can be captured through 
scenario building. More precisely, a scenario is given by a potential loss amount and the 
corresponding probability of occurrence. As an example, an expert may assert that a loss of 
one million dollars or higher is expected to occur once every (say) 5 years. This is a valuable 
information in many cases, either when loss data are rare and do not allow for statistically 
sound results or when historical loss data are not sufficiently forward-looking. In this last 
case, scenario analysis allows to incorporate external loss data. 

In what follows, we show how scenarios can be translated into restrictions on the param- 
eters of frequency and severity distributions. Once these restrictions have been identified, a 
calibration strategy can be designed where parameters are calibrated by maximizing some 
standard criterion subject to these constraints. As a result, parameter estimators can be seen 
as a mixture of the internal data-based estimator and the scenario-based implied estimator. 


5.3.4.1 Probability distribution of a given scenario 


We assume that the number of losses N (t) is a Poisson process with intensity A. Let Tn 
be the arrival time of the nt loss: 
Tn =inf{t>0:N(t)=n} 


We know that the durations Ta = Tn — Tn—1 between two consecutive losses are iid expo- 
nential random variables with parameter A. We recall that the losses X» are also tid with 
distribution F. We note now Tn (x) the duration between two losses exceeding x. It is obvi- 
ous that the durations are iid. It suffice now to characterize T; (x). By using the fact that 
a finite sum of exponential times is an Erlang distribution, we have: 


Pr {T; (x) SS Pete >t; X1 <a,...,Xn-1 < 2; Xn > x} 
= X Pr{m > t}: F(x)" (1 —F (x)) 


n>1 


n—-1 k 
= XF (x)"™> . (1 — F (x)) - ( Gt) 


n>1 


We deduce that Tn (x) follows an exponential distribution with parameter A(x) = 
A(1—F(a)). The average duration between two losses exceeding x is also the mean of 
Tn (2): 

1 
A(1 — F(a)) 


[Tn (2)] = 
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Example 58 We assume that the annual number of losses follows a Poisson distribution 
where \ = 5 and the severity of losses are log-normal LN (9,4). 


In Figure 5.16, we simulate the corresponding Poisson process N (t) and also the events 


whose loss is larger than $20 000 and $50 000. We then show the exponential distribution!” 
of Tn (x). 
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FIGURE 5.16: Simulation of the Poisson process N (t) and peak over threshold events 


5.3.4.2 Calibration of a set of scenarios 


Let us consider a scenario defined as “a loss of x or higher occurs once every d years”. By 
assuming a compound Poisson distribution with a parametric severity distribution F (x; 0), 
A is the average number of losses per year, A(x) = A(1 — F (zx; 0)) is the average number 
of losses higher than x and 1/ (x) is the average duration between two losses exceeding x. 
As a result, for a given scenario (x, d), parameters (A, 8) must satisfy: 

1 


I= ACE (z;0)) 


Suppose that we face different scenarios {(£s,ds),s = 1,..., ng}. We may estimate the 
implied parameters underlying the expert judgements using the quadratic criterion: 


(3,0) = argmin $o ws- (a. E 


17 For the parameter À (x), we have: 


A (2x 104) =5 x ( io) = 1.629 


and À (5 x 10*) = 0.907. 
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where ws is the weight of the st? scenario. The previous approach belongs to the method of 
moments. As a result, we can show that the optimal weights w, correspond to the inverse 
of the variance of ds: 


1 
var (ds) 
A (1 — F (2s; 9) 


To solve the previous optimization program, we proceed by iterations. Let co bn ) be the 
solution of the minimization program: 


2 
(Amn) = gmin Y Amna (1 sF (aibe) ' (a vie Pan z) 


Under some conditions, the estimator (im; bn ) converge to the optimal solution. We also 


notice that we can simplify the optimization program by using the following approximation: 


1 1 1 


Ws var(ds) Eld] d 


Example 59 We assume that the severity distribution is log-normal and consider the fol- 
lowing set of expert’s scenarios: 


ts (in$mn)| 1 25 5 7.5 10 20 
ds (in years)|1/4 1 3 6 10 40 


If ws = 1, we obtain \ = 43.400, fi = 11.389 and ô = 1.668 (#1). Using the approximation 
ws ~ 1/ds, the estimates become Â = 154.988, ñ = 10.141 and ô = 1.855 (#2). Finally, the 
optimal estimates are \ = 148.756, fi = 10.181 and & = 1.849 (#3). In the table below, we 
report the estimated values of the duration. We notice that they are close to the expert’s 
scenarios. 


x, (in $ mn) 1 2.5 5 7.5 10 20 
#1 0.316 1.022 2.964 5.941 10.054 39.997 
#2 0.271 0.968 2.9839 5.973 10.149 39.943 
#3 0.272 0.970 2.941 5.974 10.149 39.944 


Remark 64 We can combine internal loss data, expert’s scenarios and external loss data!’ 
by maximizing the penalized likelihood: 


7 ns 1 2 
0 = arg max Winternal * £ (8) = Wexpert Y 5 Ws (a . ) 
a4 A(L— F (a; 8) 


n$ 1 2 
Wexternal ' 2 Ws («; À (1 — F (x*; zy) 


where Winternal; Wexpert ANd Wexternal are the weights reflecting the confidence placed on 
internal loss data, expert’s scenarios and external loss data. 


18In this case, each external loss is treated as an expert’s scenario. 
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5.3.5 Stability issue of the LDA model 


One of the big issues of AMA (and LDA) models is their stability. It is obvious that the 
occurrence of a large loss changes dramatically the estimated capital-at-risk as explained 
by Ames et al. (2015): 


“Operational risk is fundamentally different from all other risks taken on by 
a bank. It is embedded in every activity and product of an institution, and 
in contrast to the conventional financial risks (e.g. market, credit) is harder 
to measure and model, and not straight forwardly eliminated through simple 
adjustments like selling off a position. While it varies considerably, operational 
risk tends to represent about 10-30% of the total risk pie, and has grown rapidly 
since the 2008-09 crisis. It tends to be more fat-tailed than other risks, and the 
data are poorer. As a result, models are fragile — small changes in the data 
have dramatic impacts on modeled output — and thus required operational risk 
capital is unstable”. 


In this context, the Basel Committee has decided to review the different measurement 
approaches to calculate the operational risk capital. In Basel III, advanced measurement 
approaches have been dropped. This decision marks a serious setback for operational risk 
modeling. The LDA model continues to be used by Basel II jurisdictions, and will continue 
to be used by large international banks, because it is the only way to assess an economic 
capital using internal loss data. Moreover, internal losses continue to be collected by banks 
in order to implement the SMA of Basel III. Finally, the LDA model will certainly become 
the standard model for satisfying Pillar 2 requirements. However, solutions for stabilizing 
the LDA model can only be partial and even hazardous or counter-intuitive, because it 
ignores the nature of operational risk. 


5.4 Exercises 
5.4.1 Estimation of the loss severity distribution 


We consider a sample of n individual losses {x1,..., £n}. We assume that they can be 
described by different probability distributions: 


(i) X follows a log-normal distribution LN (u, 07). 


(ii) X follows a Pareto distribution P (a, x—) defined by: 


Pr{X <a}=1- E) 


where x > x_ and a > 0. 


(iii) X follows a gamma distribution G (a, 8) defined by: 


® Bopz-le— Bt 
Pr{X <z = Z” dt 
eae TO 


where x > 0, a > 0 and £ >0. 


(iv) The natural logarithm of the loss X follows a gamma distribution: In X ~ G (a; 8). 
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1. We consider the case (i). 


(a) Show that the probability density function is: 


O 1 1 /ngz-pu $ 
ta) =— zf 1e )) 


(b) Calculate the first two moments of X. Deduce the orthogonal conditions of the 
generalized method of moments. 


(c) Find the maximum likelihood estimators fi and ô. 
2. We consider the case (ii). 
(a) Calculate the first two moments of X. Deduce the GMM conditions for estimating 
the parameter a. 
(b) Find the maximum likelihood estimator 4. 
3. We consider the case (iii). Write the log-likelihood function associated to the sample 


of individual losses {x1,..., £n}. Deduce the first-order conditions of the maximum 
likelihood estimators â and £. 


4. We consider the case (iv). Show that the probability density function of X is: 


6° (lng is 
f(z) = cain 


What is the support of this probability density function? Write the log-likelihood 
function associated to the sample of individual losses {x1,..., 2}. 


5. We now assume that the losses {x1,...,%n} have been collected beyond a threshold 
H meaning that X > H. 


(a) What does the generalized method of moments become in the case (i)? 
(b) Calculate the maximum likelihood estimator â in the case (ii). 


(c) Write the log-likelihood function in the case (iii). 


5.4.2 Estimation of the loss frequency distribution 


We consider a dataset of individual losses {21,...,2,} corresponding to a sample of T 


annual loss numbers {Ny,,..., Ny, }. This implies that: 


ve 
ò Ny, =n 
t=1 


If we measure the number of losses per quarter {Ng,,...,Ng,7}, the previous equation 
becomes: 


4T 
5 Nea, =n 
t=1 


1. We assume that the annual number of losses follows a Poisson distribution P (Ay). Cal- 
culate the maximum likelihood estimator Ay associated to the sample {Ny,,..., Ny; }. 
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2. We assume that the quarterly number of losses follows a Poisson distribution 
P (Ag). Calculate the maximum likelihood estimator Ag associated to the sample 


{Nossa NQar h- 


3. What is the impact of considering a quarterly or annual basis on the computation of 
the capital charge? 


4. What does this result become if we consider a method of moments based on the first 
moment? 


5. Same question if we consider a method of moments based on the second moment. 


5.4.3 Using the method of moments in operational risk models 


1. Let N (t) be the number of losses for the time interval [0, t]. We note {N1,..., Nr} 
a sample of N (t) and we assume that N (t) follows a Poisson distribution P (A). We 
recall that: 


(a) Calculate the first moment E [N (t)]. 
(b) Show the following result: 


J li (N (t) — | ar 


i=0 
Then deduce the variance of N (t). 


(c) Propose two estimators based on the method of moments. 


2. Let S be the random sum: 


where X; ~ LN (u, 07), Xi 1 Xj and N (t) ~ P()). 


(a) Calculate the mathematical expectation E [S]. 
(b) We recall that: 


n 2 n 
i=l i=1 iżj 
Show that: 
var (S) = exp (2u + 20°) 


(c) How can we estimate u and o if we have already calibrated A? 


3. We assume that the annual number of losses follows a Poisson distribution P (A). We 
also assume that the individual losses are independent and follow a Pareto distribution 
P (a, v_) defined by: 
x 


Pr{X <a}=1- E) 


where x > x_ and a > 1. 
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(a) Show that the duration between two consecutive losses that are larger than £ is 
an exponential distribution with parameter Az% ~°. 


(b) How can we use this result to calibrate experts’ scenarios? 


5.4.4 Calculation of the Basel II required capital 
We consider the simplified balance sheet of a bank, which is described below. 


1. In the Excel file, we provide the price evolution of stocks A and B. The trading 
portfolio consists of 10 000 shares A and 25000 shares B. Calculate the daily historical 
VaR. of this portfolio by assuming that the current stock prices are equal to $105.5 
and $353. Deduce the capital charge for market risk assuming that the VaR has not 
fundamentally changed during the last 3 months!’. 


2. We consider that the credit portfolio of the bank can be summarized by 4 meta-credits 
whose characteristics are the following: 


Sales EAD PD LGD M 


Bank $80 mn 1% 75% 1.0 
Corporate $500 mn $200mn 5% 60% 2.0 
SME $30 mn $50mn 2% 40% 4.5 
Mortgage $50 mn 9% 45% 
Retail $100 mn 4% 85% 


Calculate the IRB capital charge for the credit risk. 


3. We assume that the bank is exposed to a single operational risk. The severity dis- 
tribution is a log-normal probability distribution LN (8,4), whereas the frequency 
distribution is the following discrete probability distribution: 


Pr{N=5} = 60% 
Pr{N=10} = 40% 
Calculate the AMA capital charge for the operational risk. 


4. Deduce the capital charge of the bank and the capital ratio knowing that the capital 
of the bank is equal to $70 mn. 


5.4.5 Parametric estimation of the loss severity distribution 
1. We assume that the severity losses are log-logistic distributed X; ~ LL (a, 8) where: 


(x/a)" 


F (x; a, B) = 14 (x/o)? 


(a) Find the density function. 
(b) Deduce the log-likelihood function of the sample {21,...,2,}. 
(c) Show that the ML estimators satisfy the following first-order conditions: 


via F (z; a, 8) =n/2 
Da (2F (z8, 8) z 1) Ina; = n/ĝ 


19 The multiplication coefficient € is set equal to 0.5. 


346 Handbook of Financial Risk Management 


(d) The sample of loss data is 2918, 740, 3985, 2827, 2839, 6897, 7665, 3766, 3 107 
and 3 304. Verify that @ = 3 430.050 and 8 = 3.315 are the ML estimates. 


(e) What does the log-likelihood function of the sample {21,...,%} become if we 
assume that the losses were collected beyond a threshold H? 


5.4.6 Mixed Poisson process 


1. We consider the mixed poisson process where N (t) ~ P(A) and A is a random 
variable. Show that: 


var (N (t)) = E[N (t)] + var (A) 


2. Deduce that var (N (t)) > E[N (t)]. Determine the probability distribution A such 
that the equality holds. Let y(n) be the following ratio: 


(n+1)-p(n+1) 
p(n) 


y(n) = 


Show that ọ (n) is constant. 
3. We assume that A ~ G (a, 8). 


(a) Calculate E [N (t)] and var (N (t)). 


(b) Show that N (t) has a negative binomial distribution NB (r,p). Calculate the 
parameters r and p with respect to a and £. 


(c) Show that y(n) is an affine function. 


4. We assume that A ~ E (A). 


(a) Calculate E [N (t)] and var (N (t)). 
(b) Show that N (t) has a geometric distribution G (p). Determine the parameter p. 


Chapter 6 


Liquidity Risk 


Liquidity is a long-standing issue and also an elusive concept (Grossman and Miller, 1988). It 
cannot be observed directly, because it measures the ease of trading an asset. More precisely, 
it measures the asset’s ability to be sold as soon as possible without causing a significant 
price movement. This is why it is difficult to capture liquidity in a single measure (bid-ask 
spread, trading volume, etc.). Moreover, liquidity risk generally refers to two related notions: 
market liquidity and funding liquidity. Market liquidity concerns assets. For instance, the 
most liquid asset is cash because it can always be used easily and immediately. Many stocks 
and sovereign bonds are considered fairly liquid, because they can be sold in the day. On the 
contrary, private equity and real estate are less liquid assets, because it can take months to 
sell them. Funding liquidity concerns asset liability mismatch due to liquidity and maturity 
transformation activities. According to Drehmann and Nikolaou (2013), funding liquidity 
is defined “as the ability to settle obligations with immediacy. It follows that, a bank is 
illiquid if it is unable to settle obligations in time”. The concept of funding liquidity is of 
course important for banks, but also for other financial entities (insurance companies, asset 
managers, hedge funds, etc.). 


This chapter is organized as follows. The first section is dedicated to the measurement 
of asset liquidity. In the second section, we consider how funding liquidity affects the risk 
of financial institutions. The last section presents the regulatory framework for managing 
liquidity risk in a bank. This chapter may be viewed as an introduction of liquidity risk, 
which is developed in Chapter 7, which focuses on asset liability management risk and is 
complemented by Chapter 8, which is dedicated to the systemic risk, because liquidity and 
systemic risks are highly connected. 


6.1 Market liquidity 


Sarr and Lybek (2002) propose to classify market liquidity measures into four categories: 
(1) transaction cost measures, (2) volume-based measures, (3) equilibrium price-based mea- 
sures, and (4) market-impact measures. The choice of one measure depends on the objective 
of the liquidity measurement. The first category is useful for investors, who would like to 
know the cost of selling or buying immediately a security (stocks, bonds, futures, etc.). The 
second category is related to the breadth of the market and measures the trading activity 
of a security. The last two categories (price-based and market-impact measures) concern 
more the resilience and the efficiency of the market. The underlying idea is to understand 
how trading prices can move away from fundamental prices. By construction, these last two 
categories are more developed by academics whereas investors are more concerned by the 
first two categories. 
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6.1.1 Transaction cost versus volume-based measures 
6.1.1.1 Bid-ask spread 
The traditional liquidity measure is the bid-ask quoted spread S+, which is defined by: 


s, 2 Pok = ppid 

pria 
where Pask, PPi4 and P™4 are the ask, bid and mid! quotes for a given security at time t. 
By construction, the bid-ask spread can only be computed in an organized exchange with 
order books. Here, the ask price corresponds to the lowest price of sell orders, whereas 
the bid price is the highest price of buy orders. In this context, S; may be viewed as a 
transaction cost measure and is the standard liquidity measure in equity markets. 


TABLE 6.1: An example of a limit order book 


thans: Buy orders | Sell orders 

i> limit Qbiai  phidi | gasi peski 
1 65201 26.325, 70201 26.340 
2 85201 26.320 ı 116201 26.345 
3 105201 26.315 i 107365 26.350 
4 76500 26.310; 35000 26.355 
5 20000 26.305 ! 35178 26.360 


Example 60 In Table 6.1, we provide a snapshot of the limit order book of the Lyxor Euro 
Storr 50 ETF recorded at NYSE Euronext Paris’. Go and py (resp. Qe. and oad) 


indicate the quantity and the price of the buyer (resp. the seller) for the i” limit. 


This limit order book is represented in Figure 6.1, where the z-axis represents the 
quoted prices and the y-axis represented the buy and sell quantities. The bid and ask prices 
correspond to the prices of the best limit. We have PPi4 = 26.325 and Paek = 26.340, 
implying that the mid price is equal to: 

26.325 + 26.340 


pmid — — —— = 26.3325 


We deduce that the bid-ask spread is: 


_ 26.340 — 26.325 


c=- aga One PPS 


There are other variants of the bid-ask spread, which do not use quoted prices, but 
traded prices. For instance, the effective spread is equal to: 


gas P, = pmid 
TO pmid 
1 ie 
Went ask bid 
mid Pi + P; 
Pi = — 


2 


2 The corresponding date is 14:00:00 and 56,566 micro seconds on 28 December 2012. 
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26.31 26.52 26.535 26.354 26.35 26.36 
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FIGURE 6.1: An example of a limit order book 


where r7 is the trade index, P, is the price of the 7*® trade and P™4 is the midpoint of 
market quote calculated at the time t of the 7t} trade. In a similar way, the realized spread 
uses the same formula than the effective spread, but replaces P74 by the mid quote of the 
security at time t+ A: 

id 
P, — Pig 


mid 
PRA 


S7 =2 


Generally, A is set to five minutes. The realized spread S” represents the temporary com- 
ponent of the effective spread (Goyenko et al., 2009). In this case, Pi; may be viewed as 
the equilibrium price of the security after the trade®. In particular, if the trade has a price 
impact, we have Pmid # P™id and S} % S$. 


6.1.1.2 Trading volume 


The second popular measure is the trading volume V+, which indicates the dollar value 
of the security exchanged during the period t: 


V=) QP, 


Tet 


3 Another variant of the realized spread is the signed spread: 


P, — P 
s? =2s, T t+ 
Pipa 


e { +1 if the trade is a buy 
= 


where: 


—1 if the trade is a sell 
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where Q, and P, are the T*™ quantity and price traded during the period. Generally, we 
consider a one-day period and use the following approximation: 


Vix QP 


where Qi is the number of securities traded during the day t and P; is the closing price of 
the security. 


A related measure is the turnover which is the ratio between the trading volume and 
the free float market capitalization M; of the asset: 


es ees 


T: = = 
Me NiP 


where N; is the number of outstanding ‘floating’ shares. The asset turnover ratio indicates 
how many times each share changes hands in a given period‘. For instance, if the annual 
turnover is two, this means that the shares have changed hands, on average, two times 
during the year. 


Example 61 We consider a stock, whose average daily volume Q+ is equal to 1200 shares 
whereas the total number of shares N; is equal to 500000. We assume that the price is equal 
to $13 500. 


We deduce that the daily volume is: 
V+ = 1200 x 13 500 = $16.2 mn 


Because the market capitalization M+ is equal to $6.75 bn, the daily turnover represents 
0.24%. It follows that the annualized turnover? is about 62%. 


6.1.1.3 Liquidation ratio 


Another popular measure is the liquidation ratio LR (m), which measures the proportion 
of a given position that can be liquidated after m trading days. This statistic depends on the 
size of the position and the liquidation policy. A simple rule is to define a maximum number 
of shares that can be sold every day. The market convention is to consider a proportion of 
the three-month average daily volume (ADV). This serves as a proxy to bound liquidation 
costs: the higher the proportion of the ADV, the larger the trading costs. Another interesting 
statistic is the liquidation time LR~' (p), which is the inverse function of the liquidity ratio. 
It indicates the number of required trading days in order to liquidate a proportion p of the 
position. 


Example 62 We consider again Example 61. Suppose that we have a position of $30 mn 
in this stock. In order to minimize trading impacts, the liquidation policy is set to 25% of 
the average daily volume. 


The liquidity policy implies that we can sell 25% x 16.2 = $4.05 mn every day. We 
deduce that: 


4. 

LR(1) = t05 = 18.5% 
2 x 4.05 

LR(2) = a = 27% 


4This is why this ratio is generally expressed in an annual basis. 
5We multiply the daily turnover by a factor of 260. 
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Finally, we have: 


100% ifm>8 
The liquidation of the position requires 8 trading days. 

We now consider a portfolio invested into n assets. We denote (21,...,2,) the number 
of shares held in the portfolio. Let P;, be the current price of asset i. The value of the 
portfolio is equal to 5>_, £P; +. For each asset that composes the portfolio, we denote 
x} the maximum number of shares for asset i that can be sold during a trading day. The 
number of shares x; (m) liquidated after m trading days is defined as follows: 


with x; (0) = 0. The liquidation ratio LR (m) is then the proportion of the portfolio liqui- 
dated after m trading days: 
doit gao Ti (k) - Pit 

Da Li Pit 


LR (m) = 


TABLE 6.2: Statistics of the liquidation ratio (size = $10 bn, liquidation policy = 10% of 
ADV) 


P MSCI MSCI MSCI 
Statistics | SPX SX5E DAX NDX EM INDIA EMU SC 
m (in days) Liquidation ratio LR (t) in % 
1 88.4 12.3 4.8 40.1 22.1 1.5 3.0 
2 99.5 24.7 9.6 72.6 40.6 3.0 6.0 
5 100.0 58.8 24.1 99.7 75.9 7.6 14.9 
10 100.0 90.1 476 99.9 93.9 15.1 29.0 
a (in %) Liquidation time LR * (a) in days 
50 1 5 11 2 3 37 21 
75 1 7 17 3 5 71 43 
90 2 10 23 3 9 110 74 
99 2 15 29 5 17 156 455 


Source: Roncalli and Weisang (2015). 


In Table 6.2, we report the liquidation ratio and the liquidation time for several equity 
index portfolios using a size of $10 bn and assuming we can sell 10% of the ADV every 
day®. The indices are the S&P 500 index (SPX), Euro Stoxx 50 index (SX5E), DAX index, 
NASDAQ 100 index (NDX), MSCI EM index, MSCI INDIA index and MSCI EMU Small 
Cap index. We read the results as follows: CR (1) is equal to 88.4% for the S&P 500 index 
meaning that we can liquidate 88.4% (or $8.84 bn) of the portfolio on the first trading 
day; LR (5) is equal to 24.1% for the DAX index meaning that we can liquidate 24.1% 
of the assets after five trading days; LR~* (75%) is equal to 43 for the MSCI EMU Small 
Cap index meaning that we need 43 trading days to liquidate $7.5 bn for this portfolio. 
We observe that the liquidation risk profile is different from one equity index portfolio to 
another. 
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TABLE 6.3: Statistics of the liquidation ratio (size = $10 bn, liquidation policy = 30% of 
ADV) 


sa MSCI MSCI MSCI 
Statistics | SPX SX5E DAX NDX EM INDIA EMU SC 
t (in days) Liquidation ratio LR (t) in % 
1 100.0 37.0 14.5 91.0 55.5 4.5 9.0 
2 100.0 67.7 28.9 99.8 81.8 9.1 17.8 
5 100.0 99.2 68.6 100.0 98.5 22.6 40.4 
10 100.0 100.0 99.6 100.0 100.0 43.1 63.2 
a (in %) Liquidation time LR! (a) in days 
50 1 2 4 1 1 13 T 
T5 1 3 6 1 2 24 15 
90 1 4 8 1 3 37 25 
99 1 5 10 2 6 52 152 


Source: Roncalli and Weisang (2015). 


These figures depend on the liquidation policy and the liquidation size. For instance, if 
we use an average daily volume of 30%, we obtain the results given in Table 6.3. In this case, 
liquidity ratios are improved. Nevertheless, we continue to observe that all these indices do 
not present the same liquidity profile. In Figure 6.2, we report the liquidation ratio for 
different indices. We notice that the liquidity profile is better for the S&P 500 index for a 
size of $50 bn than for the Euro Stoxx 50 index for a size of $10 bn. We also observe that 
liquidating $1 bn of MSCI INDIA index is approximately equivalent to liquidating $10 bn 
of Euro Stoxx 50 index. These results depend on the free-float market capitalization of each 
index. For instance, the capitalization of the S&P 500 is equal to $18 tn at the end of April 
2015. This contrasts with the capitalization of the MSCI EMU Small Cap, which is equal 
to $448 bn. 


6.1.1.4 Liquidity ordering 


The bid-ask spread and the daily trading volume are easily available in financial infor- 
mation systems (Bloomberg, Reuters, etc.). They represent two aspects of the liquidity. S 
is an estimate of the trading cost in the case of small orders. When we consider an order of 
big size, S; is not valid because the order may have an impact on the price and it may also 
be not possible to trade immediately. In this case, it is better to consider V+, which gives 
the average trading activity of the security. Indeed, the investor may compare the size of 
his order and the depth of the market. 


These two statistics may be used to compare the liquidity £ of securities i and 7. We 
will say that the liquidity of security 7 is better than the liquidity of security j if security i 
has a lower bid-ask spread: 

Sit < Sje > L (i) +2) 


or if security 7 has a higher trading volume: 


Vie > Vit > LM > LY) 


6For the composition of the portfolio and the ADV statistics, Roncalli and Weisang (2015) use the data 
of 30 April 2015. 
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FIGURE 6.2: Comparing the liquidation ratio (in %) between index fund portfolios 


Source: Roncalli and Weisang (2015). 


It would be wrong to think that the two measures S, and V; gives the same liquidity 
ordering. For instance, Roncalli and Zheng (2015) show that it is far from being the case 
in European ETF markets. In fact, liquidity is a multi-faceted concept and recovers various 
dimensions. This explains that there is a multiplication of other liquidity measures. 


6.1.2 Other liquidity measures 


The Hui-Heubel liquidity ratio is a measure of the resilience and the depth. It combines 


turnover and price impact: 
we = 2 (hee 
t T; plow 


where PP ish and P}°” are the highest and lowest prices during the period t, and T; is the 
turnover observed for the same period. H? can be calculated on a daily basis or with a 
higher frequency period. For example, Sarr and Lybek (2002) propose to consider a 5-day 
period in order to capture medium-term price impacts. 


Among price-based measures, Sarr and Lybek (2002) include the variance ratio of Has- 
brouck and Schwartz (1988), also called the market efficiency coefficient (MEC), which is 
the ratio between the annualized variance of long-period returns Rt t+h (h >> 1) and the 
annualized variance of short-period returns Ri t+1: 


var (Ri t+h) 


VR= 
var (Rit+1) 


However, this ratio may be not pertinent because it is related to the reversal alternative 
risk premium or the auto-correlation trading strategy. In fact, this ratio is another measure 
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of the auto-correlation of asset returns. Goyenko et al. (2009) define the price impact as the 
“cost of demanding additional instantaneous liquidity”. In this case, it corresponds to the 
derivative of the spread with respect to the order size: 
PI- S° (big) — S° (small) 
Q (big) — Q (small) 


where S° (big) and Q (big) (resp. S® (small) and Q (small)) are the average of the effective 
spread and the average of order size for big trades (resp. small trades). This measure is 
however difficult to implement, because we need to split all the trades into big and small 
orders. This is why this measure is very sensitive to the choice of the size threshold. A more 
interesting and popular market-impact measure is the Amihud measure defined by: 


1 |Re t+1l 
ILLIQ = — J —_ 
Q Nt ; V; 


where Ri t+1 is the daily return, V; is the daily trading volume and n, is the number of 
days used to calculate the sum. Amihud (2002) uses this ratio to measure the relationship 
between the (absolute) price slope and the order flow. 


This liquidity ratio is one of the most popular academic measures with the implicit 
spread of Roll (1984), who assumes that the fundamental price P* follows a random walk: 


PY = Př} +E 


whereas the observed price depends on the trade direction: 


S 
r= Pi +s (3) 


where S is the bid-ask spread and: 


_ J +1 if the trade is a buy 
St) —1 if the trade is a sell 


We deduce that: 


2 


By assuming that buy and sell orders have the same probability, Roll shows that the first- 
order auto-covariance of price changes is equal to: 


AP, = As; k (5) +E: 


S 2 
COV (AP,, AP;-1) = COV (Ast, Asy_1) : (5) 


We can therefore deduce the implied spread by the following expression: 


S = 24/— COV (AP, AP;_1) 


To estimate Š, we can use the empirical covariance of price changes or the Gibbs sampler 
proposed by Hasbrouck (2009). 


Remark 65 The seminal paper of Roll has been extended in several directions: asymmetric 
information, serial dependence of the trades, etc’. 


“See Huang and Stoll (1996, 1997) for a survey. 
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6.1.3 The liquidity-adjusted CAPM 


The liquidity-adjusted CAPM is an extension of the capital asset pricing model of Sharpe 
(1964). This model, which has been proposed by Acharya and Pedersen (2005), analyzes 
the relationship between liquidity and asset prices. It goes beyond the traditional approach, 
which consists in considering a static liquidity premium that affects asset returns. In this 
approach, the level of liquidity® is the most important factor to take into account. For 
instance, when we consider real assets, we generally consider that their returns must incor- 
porate a risk premium. A typical example concerns private equity. However, most of the 
time, when we think about the liquidity premium, we think that it is related to the level of 
illiquidity of the asset, and not to the dynamics of the liquidity. However, the issue is more 
complex: 


“[...] there is also broad belief among users of financial liquidity — traders, in- 
vestors and central bankers — that the principal challenge is not the average level 
of financial liquidity... but its variability and uncertainty” (Persaud, 2003). 


The liquidity-adjusted CAPM (or L-CAPM) considers a framework where both the level of 
liquidity and the variability have an impact on asset prices. 

We note R; + and Li+ the gross return and the relative (stochastic) illiquidity cost of Asset 
i. At the equilibrium, Acharya and Pedersen (2005) show that “the CAPM in the imagined 
frictionless economy translates into a CAPM in net returns for the original economy with 
illiquidity costs”: 


a [Rit — Lig] -r = Bi (E [Rm t — Lm] r) (6.1) 


where r is the return of the risk-free asset, Rm and Lm, are the gross return and the 
illiquidity cost of the market portfolio, and £; is the liquidity-adjusted beta of Asset i: 


Š = cov (Rit — Lit, Rm, — Lm.) 
7 var (Rm,t — Lm,t) 


Equation (6.1) shows that the net risk premium of an asset, that is the risk premium minus 
the liquidity cost, is equal to its beta times the net market risk premium. However, the 
beta in this formula is different than the formula in the CAPM, because the beta depends 
on the liquidity of the asset and the liquidity of the market. Indeed, the liquidity-adjusted 
beta can be decomposed into four betas’: 


Bi = Bi + B (Lit, Lmt) — 8 (Rit, baad) — L Cea) 


where 8; = 8 (Ri tRm,) is the standard market beta, 6 (Lit, Lm) is the beta associated to 
the commonality in liquidity with the market liquidity, 3 (Riz, Dm) is the beta associated 
to the return sensitivity to market liquidity and £ (Li t, Rm,t) is the beta associated to the 
liquidity sensitivity to market returns. Therefore, some assets have a low (or high) beta with 


8Or more precisely the level of illiquidity. 
°We have: 


cov (Ri t — Lit, Rm,t — Lmt) = (Ri, t — Lit): (Rm,t — Lm,t)] — 
i,t — Lit]: E [Rm,t — Lmt] 
itRm,t + Li tLm,t — Ri tLm,t — Li, Rm,t] — 
E [Rit] — E[Liz]) - (Œ [Rmt] — E[Lm,t]) 
= cov (Ri, t, Rm,t) + cov (Lit, Lm,t) — 
cov (Ri t, Lm,t) — cov (Li t, Rm,t) 


[ 
[ 
[ 
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respect to the market portfolio, not because their returns do not covary (or highly covary) 
with the returns of the market portfolio, not because of their liquidity level, but because of 
the time-variability of the liquidity and the impact of the liquidity on market returns. 


Acharya and Pedersen (2005) propose to rewrite Equation (6.1) into the CAPM equation: 


a [Ri] -T = Qi +t bi ( ) [Rm,t] = r) (6.2) 


where q; is a function of the relative liquidity of Asset i with respect to the market portfolio 
and the liquidity betas: 


a, = (E[Lit] — Bi © [Lm,t]) + 
(B (Lit, Lm) — 8 (Rit, Lm,t) — 2 (Lit, Rm,t)) + Om 


where Tm = E [Rm] — r. It follows that the asset return can be written as: 


Rit = Qi + Bi: Rms t Eit 


where 64,4 ~ N (0, a?) and õ; is the specific volatility of the asset. We retrieve the classical 
one-factor model, but with a time-varying alpha component. Contrary to the common wis- 
dom, the alpha of the asset is not only equal to the illiquidity level. Indeed, Acharya and 
Pedersen (2005) show that the level of liquidity explains 75% of the alpha, whereas 25% of 
the alpha is explained by the liquidity sensitivity of the asset to market returns. 


The previous model can also be written as follows: 
Rit — 7 = u (Lit) + (R (Lit) + bi) (Rmt — 17) + Eit 
where u (Li+) is the relative liquidity level: 
H (Lit) = E [Lit] — 2i E[Lm,t] 
and R (Li+) is the aggregated liquidity risk: 
R (Lit) = b (Lit, Lm,t) — 8 (Rit, Lm,t) — B (Lit, Rm,t) 


R (Liz) is composed of three liquidity covariance risks, and Acharya and Pedersen (2005) 
interpret each of them as follows: 


1. the first covariance risk 8 (Li, t, Lm,t) shows that an asset that becomes illiquid when 
the market becomes illiquid should have a higher risk premium; this risk is related to 
the substitution effects we observe when the market becomes illiquid; 


2. the second covariance risk 6 (Riz, Lm,t) indicates that assets that perform well in 
times of market illiquidity should have a lower risk premium because of the solvency 
constraints faced by investors; 


3. the third covariance risk 6 (Liz, Rm,t) means that investors accept a lower risk pre- 
mium on assets that are liquid in a bear market, because they have the property to 
be sold in illiquid markets. 


It is obvious that these three liquidity risks are correlated and Acharya and Pedersen (2005) 
estimate the following correlation figures: 


| B (Lit, Lm.) B (Rit, Lm.) B (Lit, Rm.) 
Lit, Lm,t) 100% 
Dac -57% 100% 
Rmt) —94% 73% 100% 
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The Acharya-Pedersen model illustrates perfectly well some stylized facts concerning some 
asset classes such as corporate bonds, small cap stocks or private equities. In particular, 
it shows that liquidity has an impact on the risk premium of securities, but it has also 
an impact on the price dynamics because of the liquidity risk or uncertainty. This implies 
that liquidity has an impact on the systematic return component. We will see later how 
this interconnectedness between asset liquidity and market liquidity is important when 
regulators would like to manage the systemic risk of the financial system. 


6.2 Funding liquidity 


According to Nikolaou (2009), we must distinguish three liquidity types: market liquidity, 
funding liquidity and central bank liquidity. Funding liquidity is the ability of banks to 
meet their liabilities, whereas central bank liquidity is the ability of central banks to supply 
the liquidity needed by the financial system. As noticed by Nikolaou (2009), central bank 
liquidity is not an issue as long as there is a demand for the domestic currency. In this section, 
we focus on funding liquidity, which is in fact the main issue of liquidity risk. Indeed, the 
2008 Global Financial Crisis has demonstrated that it is the problematic linkage layer even 
when central bank liquidity is infinite. 


6.2.1 Asset liability mismatch 


Whereas market liquidity is asset specific, funding liquidity is agent specific (Brunner- 
meier and Pedersen, 2009). For instance, we can measure the market liquidity of a stock, 
a bond or a futures contract. In a similar way, we can measure the funding liquidity of a 
bank, an insurer or a corporate firm. We can extend these measures to a portfolio of secu- 
rities or a group of entities. Therefore, we can define the global market liquidity of an asset 
class, for example the liquidity of US large cap stocks or the liquidity of EUR-denominated 
convertible bonds. We can also define the global funding liquidity of a financial system, 
for example the liquidity of Italian banks or the liquidity of the Japanese financial system. 
At first sight, funding liquidity seems to be the mirror image of market liquidity when we 
consider banks instead of securities. In fact, it is a false view for several reasons. The first 
reason concerns the distinction between funding liquidity and funding liquidity risk: 


“We define funding liquidity as the ability to settle obligations with immediacy. 
Consequently, a bank is illiquid if it is unable to settle obligations. Legally, a 
bank is then in default. Given this definition we define funding liquidity risk as 
the possibility that over a specific horizon the bank will become unable to settle 
obligations with immediacy” (Drehmann and Nikolaou, 2013, page 2174). 


In the previous section, we have seen several measures of the market liquidity, and these 
measures can be used to calculate the market liquidity risk, that is the market liquidity 
at some time horizon. Funding liquidity is more a binary concept and is related to credit 
risk. Therefore, funding liquidity risk may be viewed as the probability that the bank 
will face a funding liquidity problem in the future. The difficulty is then to make the 
distinction between funding liquidity and credit risks, since their definitions are very close. 
Said differently, the issue is to measure the probability of funding liquidity risk and not the 
probability of default risk (Drehmann and Nikolaou, 2013). 
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Drehmann and Nikolaou (2013) considers a stock-flow measure. Let D; be the indicator 
function, which takes the value 0 if the bank faces no funding liquidity risk, or 1 otherwise. 
We have the following relationship: 


D,=08S0O,<4+™M, (6.3) 


where O; are the outflows, J; are the inflows and M; is the stock of money at time t. The 
general components of O, and J; are: 


Or = Lynew,t + Adue,t i IP, 


and: 
L = Laue,t ap Anew,t F IR: 


where Lnew,t and Laue, are liabilities which are newly issued or due, Anew,t and Adue,t 
are assets which are newly issued or due, and IP, and IR, are interest payments paid or 
received by the bank. These outflows/inflows concerns 5 categories: (DP) depositors, (IB) 
interbank, (AM) asset market, (OB) off-balance sheet items and (CB) central banks. The 
authors define then the net liquidity demand NLD, = O; — I; — M; as the net amount 
of central bank money the bank needs to remain liquid and show that this variable must 
satisfy the following inequality: 

NLD; < PPP LBR, + PIP LB ye + PAM Aso, + POPC Bnew,t (6.4) 
where PF is the price of the category k, Eoi and Lw correspond to the new liabilities 
from depositors and the interbank market, Asola,t is the amount of selling assets and C Bnew,t 
is the new central bank money. Equation (6.4) gives the different components that the bank 
can access when O; > h + Mi. 


This simple model shows that three dimensions are important when measuring the 
funding liquidity risk. First, the time horizon is a key parameter. Second, the projection 
of assets and liabilities is not an easy task. This is particularly true if the bank is highly 
leveraged or operates a larger maturity transformation between assets and liabilities. Third, 
we have to take into account spillover effects. Indeed, the bank does not know the reaction 
function of the other financial agents if it faces asset/liability liquidity mismatch!°. 


6.2.2 Relationship between market and funding liquidity risks 


Brunnermeier and Pedersen (2009) highlights the interconnectedness of market liquidity 
and funding liquidity: 


“Traders provide market liquidity, and their ability to do so depends on their 
availability of funding. Conversely, traders’ funding, i.e., their capital and margin 
requirements, depends on the assets’ market liquidity. We show that, under 
certain conditions, margins are destabilizing and market liquidity and funding 
liquidity are mutually reinforcing, leading to liquidity spirals” (Brunnermeier 
and Pedersen, 2009, page 2201). 


The model allows the authors to show that market liquidity can suddenly dry up. This 
analysis has been extended by Nikolaou (2009), who includes the central bank liquidity for 
analyzing the liquidity linkages. In normal times, we observe a virtuous liquidity circle that 
reinforces the financial system stability (Figure 6.3). Nevertheless, the liquidity linkages can 


10This problem is discussed in Chapter 8, which is dedicated to the systemic risk. 
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FIGURE 6.3: The liquidity nodes of the financial system 


Source: Nikolaou (2009). 


be broken in bad times. It is commonly accepted that funding liquidity is the central node, 
because central banks have no interest to break the linkages and market illiquidity can 
only be temporary. Indeed, market illiquidity is generally associated to periods of selling 
(or bear) markets. However, there is an intrinsic imbalance between supply and demand in 
financial markets. Investors, financial institutions, households and corporate firms have a 
common objective to buy financial assets, not to sell them, in order to finance retirement 
pensions, future consumptions and economic growth. Therefore, the weak link is the funding 
liquidity, because funding liquidity risks can easily create systemic risks. 


The 2008 Global Financial Crisis is the typical example of a systemic risk crisis, which is 
mainly due to the liquidity risk, especially the funding liquidity risk. We can represent the 
different sequences of the crisis with the scheme given in Figure 6.4. The starting point was 
the subprime debt crisis that has impacted banks. Therefore, it was first a credit risk crisis. 
However, its strength weakened the banks, leading to a reduction of the funding liquidity. At 
the same time, the banking system dramatically reduces the funding to corporate firms, asset 
managers and hedge funds. In order to obtain cash, investors sold liquid assets, and more 
especially stocks. The drop of stock prices affected banks because the value of collateral 
portfolios has decreased. It followed a feedback loop between credit risk, liquidity risk, 
market risk and collateral risk. 


Remark 66 Many people compare the GFC to the dot-com crisis, certainly because the 
performance of the stock market is similar. Indeed, during the dot-com crisis, the S&P 500 
index experienced a maximum drawdown about 49% between March 2000 and March 2009, 
whereas it was equal to 56% during the subprime crisis. However, the behavior of stocks was 
different during these two periods. During the internet crisis, 55% of stocks posted a negative 
performance, while 45% of stocks posted a positive performance. The dot-com crisis is then 
a crisis of valuation. During the GFC, 95% of stocks posted a negative performance. In fact, 
the 2008 crisis of the stock market is mainly a liquidity crisis. This explains that almost 
all stocks had a negative return. This example perfectly illustrates the interconnectedness of 
funding liquidity and market risks. 
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FIGURE 6.4: Spillover effects during the 2008 global financial crisis 


Feedback loop 


6.3 Regulation of the liquidity risk 


In Basel III, the liquidity risk is managed using two layers. First, the market liquidity 
issue is included in the market risk framework by considering five liquidity horizons: 10, 
20, 40, 60 and 120 days. For example, the liquidity horizon for large cap equity prices is 
set to 10 days whereas the liquidity horizon for credit spread volatilities is set to 120 days. 
Therefore, there is a differentiation in terms of asset classes and instruments. Second, the 
Basel Committee has developed two minimum standards for funding liquidity: the liquidity 
coverage ratio (LCR) and the net stable funding ratio (NSFR). The objective of the LCR 
is to promote short-term resilience of the bank’s liquidity risk profile, whereas the objective 
of the NSFR is to promote resilience over a longer time horizon. Moreover, these tools are 
completed by the leverage ratio. Indeed, although its first objective is not to manage the 
funding liquidity risk, the leverage risk is an important component of the funding liquidity 
risk!!, 


6.3.1 Liquidity coverage ratio 
6.3.1.1 Definition 
The liquidity coverage ratio is defined as: 


HQLA 
L = >1 
ck Total net cash outflows ~ 0u% 


11See Section 8.1.2 on page 456. 
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TABLE 6.4: Stock of HQLA 


Level Description Haircut 
Level 1 assets 
Coins and bank notes 
Sovereign, central bank, PSE, and MDB assets 
qualifying for 0% risk weighting 
Central bank reserves 
Domestic sovereign or central bank debt for 
non-0% risk weighting 
Level 2 assets (maximum of 40% of HQLA) 
‘Level 2A assets = oe Do 
Sovereign, central bank, PSE and MDB assets 
qualifying for 20% risk weighting 
Corporate debt securities rated AA— or higher 
Covered bonds rated AA— or higher 
-Level 2B assets (maximum of 15% of HQLA) ~ 
RMBS rated AA or higher 25% 
Corporate debt securities rated between A+ 
50% 
and BBB— 
Common equity shares 50% 


0% 


15% 


Source: BCBS (2013a). 


where the numerator is the stock of high quality liquid assets (HQLA) in stressed conditions, 
and the denominator is the total net cash outflows over the next 30 calendar days. The 
underlying idea of the LCR is that the bank has sufficient liquid assets to meet its liquidity 
needs for the next month. 

An asset is considered to be a high quality liquid asset if it can be easily converted into 
cash. Therefore, the concept of HQLA is related to asset quality and asset liquidity. Here are 
the comprehensive list of characteristics used by the Basel Committee for defining HQLA: 


e fundamental characteristics (low risk, ease and certainty of valuation, low correlation 
with risky assets, listed on a developed and recognized exchange); 


e market-related characteristics (active and sizable market, low volatility, flight to qual- 
ity). 


BCBS (2013a) divides the stock of HQLA into two buckets (see Table 6.4). The first bucket 
(called level 1 assets) has a 0% haircut. It consists of coins and banknotes, central bank 
reserves, and qualifying marketable securities from sovereigns, central banks, public sector 
entities (PSE), and multilateral development banks (MDB), whose risk weight is 0% under 
the Basel II SA framework for credit risk. The level 1 assets also include sovereign or 
central bank debt securities issued in the domestic currency of the bank’s home country. 
In the second bucket (also called level 2 assets), assets have a haircut higher than 0%. For 
instance, a 15% haircut is applied to sovereign, central bank, PSE and MBD assets that 
have a 20% risk weight under the Basel II SA framework for credit risk. A 15% haircut is 
also valid for corporate debt securities that are rated at least AA—. Three other types of 
assets are included in the second bucket (level 2B assets). They concern RMBS rated AA or 
higher, corporate debt securities with a rating between A+ and BBB—, and common equity 
shares that belong to a major stock index. Moreover, the HQLA portfolio must be well 
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diversified in order to avoid concentration (except for sovereign debt of the bank’s home 
country). 


We notice that level 2 assets are subject to two caps. Let £HQLA, %1 and £2 be the value 
of HQLA, level 1 assets and level 2 assets. We have: 


THQLA = +X 
t= T2A + T2B 
s.t. z2A <0.40- THQLA 
t2B < 0.15 - THQLA 


We deduce that one trivial solution is: 


5 
z oe 
ZAQLA = min (Gene + T2 
* 
Uy = T1 
1 
Ko ok _ pk (6.5) 
T2 = THQLA ` 1 
* ay E * 
xh, = min (x3, 224) 
* — * * 
top = T2 — Loa 


Example 63 We consider the following assets: (1) coins and bank notes = $200 mn, (2) 
central bank reserves = $100 mn, (3) 20% risk-weighted sovereign debt securities = $200 
mn, (4) AA corporate debt securities = $300 mn, (5) qualifying RMBS = $200 mn and (6) 
BB+ corporate debt securities = $500 mn. 


Results are given in the table below. We notice that the gross value of assets is equal to 
$1.5 bn. However, level 2 assets represent 80% of this amount, implying that the 40% cap 
is exceeded. Therefore, we have to perform the correction given by Equation (6.5). Finally, 
the stock of HQLA is equal to $500 mn. 


Assets Haircut o A 
Level 1 assets (1) + (2) 300 0% 300 300 
Level 2 assets 1200 825 200 
= 2A (38) + (4) 500 15% 425 200 ~S 
a): (5)+ (6) 70 = 400 00 
(5) 200 25% 150 0 
(6) 500 50% 250 0 
Total 1500 1125 500 


Remark 67 The previous example shows that the bank may use secured funding transac- 
tions (repos) to circumvent the caps on level 2 assets. This is why the LCR requires adjusting 
the amount of HQLA by taking into account the unwind of repos maturing within 30 calendar 
days that involve the exchange of HQLA'?. 


The value of total net cash outflows is defined as follows: 


Total net cash outflows = ‘Total expected cash outflows — 


iin Total expected cash inflows, 
75% of total expected cash outflows 


12See §48 and Annex 1 in BCBS (2013a). 
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TABLE 6.5: Cash outflows of the LCR 


Liabilities Description Rate 
Retail deposits 
Demand and term deposits (less than 30 days) 


Stable deposits covered by deposit insurance 3% 
Stable deposits 5% 
Less stable deposits 10% 
Term deposits (with residual maturity greater than 30 days) 0% 


Unsecured wholesale funding 
Demand and term deposits (less than 30 days) provided by 
small business customers 


Stable deposits 5% 
Less stable deposits 10% 
Deposits generated by clearing, custody and cash management 25% 
Portion covered by deposit insurance 5% 
Cooperative banks in an institutional network 25% 
Corporates, sovereigns, central banks, PSEs and MDBs 40% 
Portion covered by deposit insurance 20% 
Secured funding transactions 
With a central bank counterparty 0% 
Backed by level 1 assets 0% 
Backed by level 2A assets 15% 
Backed by non-level 1 or non-level 2A assets with domestic 25% 
sovereigns, PSEs or MDBs as a counterparty 
Backed by level 2B RMBS assets 25% 
Backed by other level 2B assets 50% 
All other secured funding transactions 100% 
Additional requirements 
Margin/collateral calls > 20% 
ABCP, SIVs, conduits, SPVs, etc. 100% 
Net derivative cash outflows 100% 
Other credit /liquidity facilities > 5% 


Source: BCBS (2013a). 


Therefore, it is the difference between cash outflows and cash inflows, but with a floor of 
25% of cash outflows. Cash outflows/inflows are estimated by applying a run-off/flow-in 
rate to each category of liabilities/receivables. 

In Table 6.5, we report the main categories of cash outflows and their corresponding run- 
off rates. These outflow rates are calibrated according to expected stability or ‘stickiness’. 
Run-off rates range from 3% to 100%, depending on the nature of the funding. The less stable 
funding is perceived to be, the higher the outflow rate. For example, a 3% rate is assigned 
to stable retail deposits that benefit of deposit insurance (protection offered by government 
or public guarantee schemes). On the contrary, the rate of deposits from corporates is equal 
to 40%. 

The categories of cash inflows are given in Table 6.6. Maturing secured lending transac- 
tions (reverse repos and securities borrowing) have inflow rates from 0% to 100%. Amounts 
receivable from retail and corporate counterparties have an inflow rate of 50%, whereas 
amounts receivable from financial institutions have an inflow rate of 100%. 
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TABLE 6.6: Cash inflows of the LCR 


Receivables Description Rate 
Maturing secured lending transactions 

Backed by level 1 assets 0% 
Backed by level 2A assets 15% 
Backed by level 2B RMBS assets 25% 
Backed by other level 2B assets 50% 
Backed by non-HQLAs 100% 
Other cash inflows 

Credit /liquidity facilities provided to the bank 0% 
Inflows to be received from retail counterparties 50% 


Inflows to be received from non-financial wholesale counterparties 50% 
Inflows to be received from financial institutions and 

100% 
central banks 
Net derivative receivables 100% 


Source: BCBS (2013a). 


Example 64 The bank has $500 mn of HQLA. Its main liabilities are: (1) retail stable 
deposit = $17.8 bn ($15 bn have a government guarantee), (2) retail term deposit (with a 
maturity of 6 months) = $5 bn, (3) stable deposit provided by small business customers 
= $1 bn, and (4) deposit of corporates = $200 mn. In the next thirty days, the bank also 
expects to receive $100 mn of loan repayments, and $10 mn due to a maturing derivative. 


We first calculate the expected cash outflows for the next thirty days: 


Cash outflows = 3% x 15000 + 5% x 2800+0% x 5000 + 
5% x 1000 + 40% x 200 
= $720 mn 


We then estimate the cash inflows expected by the bank for the next month: 
Cash inflows = 50% x 100+ 100% x 10 = $60 mn 


Finally, we deduce that the liquidity coverage ratio of the bank is equal to: 


500 


LCR 


6.3.1.2 Monitoring tools 


In addition to the LCR, the Basel Committee has defined five monitoring tools in order 
to analyze the bank’s liquidity risk management: 


1. Contractual maturity mismatch 
. Concentration of funding 
. Available unencumbered assets 


. LCR by significant currency 


oa FF WO N 


. Market-related monitoring tools 


Liquidity Risk 365 


The contractual maturity mismatch defines “the gaps between the contractual inflows 
and outflows of liquidity for defined time bands”. The Basel Committee suggests the follow- 
ing time buckets: overnight, 7 days, 14 days, 1, 2, 3, 6 and 9 months, 1, 3, 5 and 5+ years. 
The goal of the second metric is to identify the main sources of liquidity problems. Thus, 
the concentration of funding groups three types of information: 


1. funding liabilities sourced from each significant counterparty as a % of total liabilities; 


2. funding liabilities sourced from each significant product/instrument as a % of total 
liabilities; 


3. list of asset and liability amounts by significant currency. 


The third metric concerns available unencumbered assets that are marketable as collateral 
in secondary markets, and available unencumbered assets that are eligible for central banks’ 
standing facilities. The bank must then report the amount, type and location of available 
unencumbered assets that can be used as collateral assets. For the fourth metric, the bank 
must calculate a ‘foreign currency LCR’ for all the significant currencies. A currency is said 
to be significant if the liabilities denominated in that currency is larger than 5% of the 
total liabilities. Supervisors are in charge of producing the last metric, which corresponds 
to market data that can serve as warning indicators of liquidity risks (CDS spread of the 
bank, trading volume in equity markets, bid/ask spreads of sovereign bonds, etc.). These 
indicators can be specific to the bank, the financial sector, or a financial market. 


6.3.2 Net stable funding ratio 


While the LCR measures the funding liquidity risk for the next month, NSFR is designed 
in order to promote resilience of the bank’s liquidity profile for the next year. Like the LCR, 
NSFR is based on the asset liability approach, but it is more comprehensive than the LCR 
because of the long-term horizon. In some sense, it is closer to the framework that has been 
proposed by Drehmann and Nikolaou (2013). 


6.3.2.1 Definition 


It is defined as the amount of available stable funding (ASF) relative to the amount of 
required stable funding (RSF): 


Available amount of stable funding 


NSFR = > 100% 


Required amount of stable funding 


The available amount of stable funding (ASF) corresponds to the regulatory capital plus 
some other liabilities, whereas the required amount of stable funding (RSF) is the sum of 
weighted assets and off-balance sheet exposures. We have: 


ASF = 5° ASF. Li 


t 


and: 


RSF =X JP. A; 
j 


where JOSE is the ASF factor for liability i, L; is the amount of liability 7, JES is the RSF 
factor for asset j, and A; is the amount of asset j. 
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6.3.2.2 ASF and RSF factors 


The ASF factor can take 5 values: 100%, 95%, 90%, 50% and 0%. Here are the main 
components of the available amount of stable funding: 


Liabilities receiving a 100% ASF factor 

This concerns (1) regulatory capital (excluding tier 2 instruments with residual ma- 
turity of less than one year) and (2) other capital instruments with effective residual 
maturity of one year or more. 


Liabilities receiving a 95% ASF factor 
This includes (3) stable non-maturity/term deposits of retail and small business cus- 
tomers (with residual maturity of less than one year). 


Liabilities receiving a 90% ASF factor 
This corresponds to (4) less stable non-maturity/term deposits of retail and small 
business customers (with residual maturity of less than one year). 


Liabilities receiving a 50% ASF factor 

In this category, we find (5) funding provided by sovereigns, corporates, MDBs and 
PSEs (with residual maturity of less than one year), (6) funding provided by central 
banks and financial institutions (with residual maturity between 6 months and one 
year) and (7) operational deposits. 


Liabilities receiving a 0% ASF factor 
This category corresponds to (8) all the other liabilities. 


The RSF factor takes values between 0% and 100%. The main components of the required 
amount of stable funding are the following: 


Assets receiving a 0% RSF 
This concerns (1) coins and banknotes and (2) all central bank reserves and all claims 
on central banks with residual maturities of less than six months. 


Assets receiving a 5% RSF 
In this category, we find (3) other unencumbered level 1 assets. 


Assets receiving a 10% RSF 
This includes (4) unencumbered secured loans to financial institutions with residual 
maturities of less than six months. 


Assets receiving a 15% RSF 
This category if composed of (5) all other unencumbered loans to financial institutions 
with residual maturities of less than six months and (6) unencumbered level 2A assets. 


Assets receiving a 50% RSF 
This corresponds to (7) unencumbered level 2B assets and (8) all other assets with 
residual maturity of less than one year. 


Assets receiving a 65% RSF 

This concerns (9) unencumbered residential mortgages and loans (excluding loans to 
financial institutions) with a residual maturity of one year or more and with a risk 
weight of less than or equal to 35% under the Standardized Approach. 
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e Assets receiving a 85% RSF 
In this category, we have (10) cash, securities or other assets posted as initial margin 
for derivative contracts and provided to contribute to the default fund of a CCP, 
(11) other unencumbered performing loans (excluding loans to financial institutions) 
with a residual maturity of one year or more and with a risk weight of less greater 
than 35% under the Standardized Approach, (12) exchange-traded equities and (13) 
physical traded commodities, including gold. 


e Assets receiving a 100% RSF 
This category is defined by (14) all assets that are encumbered for a period of one year 
or more and (15) all other assets (non-performing loans, loans to financial institutions 
with a residual maturity of one year or more, non-exchange-traded equities, etc.). 


Example 65 We assume that the bank has the following simplified balance sheet: 


Assets Amount | Liabilities Amount 
Loans Residential 150 Deposits Stable 100 
Corporate 60 Less stable 150 
Level 1A 70 Short-term borrowing 50 
2B 40 Capital 20 


We deduce that: 
ASF = 95% x 100 + 90% x 150+ 50% x 50+ 100% x 20 = 275 


and: 
RSF = 85% x 150 + 85% x 60+ 5% x 70+ 50% x 40 = 202 


The NSFR is then equal to: 


275 
= — = 1 
NSFR 202 36% 


6.3.3 Leverage ratio 


As said previously, the leverage ratio completes the framework of market and liquidity 
risks. It is defined as the capital measure divided by the exposure measure. Since January 
2018, this ratio must be below 3%. The capital measure corresponds to the tier 1 capital, 
while the exposure measure is composed of four main exposures: on-balance sheet exposures, 
derivative exposures, securities financing transaction (SFT) exposures and off-balance sheet 
items. The big issue is the definition of derivative exposures, because we can adopt either a 
notional or a mark-to-market approach. Finally, the Basel Committee has decided to define 
them as the sum of replacement cost and potential future exposure, meaning that derivative 
exposures correspond to a CCR exposure measure. 


Remark 68 We could have discussed the leverage risk ratio in other chapters, in particular 
when considering systemic risk and shadow banking system. In fact, liquidity, leverage and 
systemic risks are so connected that it is difficult to distinguish them. 


Taylor & Francis 
Taylor & Francis Group 
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Chapter 7 


Asset Liability Management Risk 


Asset liability management (ALM) corresponds to the processes that address the mismatch 
risk between assets and liabilities. These methods concern financial institutions, which are 
mainly defined by a balance sheet. For example, this is the case of pension funds and 
insurance companies. In this chapter, we focus on ALM risks in banks, and more precisely 
ALM risks of the banking book. Previously, we have already seen some risks that impact 
the banking book such as credit or operational risk. In what follows, we consider the four 
specific ALM risks: liquidity risk!, interest rate risk, option risk and currency risk. 

Generally, ALM risks are little taught in university faculties because they are less known 
by academics. In fact, asset liability management is a mix of actuarial science, accounting 
and statistical modeling, and seems at first sight less mathematical than risk management. 
Another difference is that the ALM function is generally within the finance department 
and not within the risk management department. This is because ALM implies to take 
decisions that are not purely related to risk management considerations, but also concerns 
commercial choices and business models. 


7.1 General principles of the banking book risk management 


Before presenting the tools to manage the ALM risks, we define the outlines of the 
asset and liability management. In particular, we show why ALM risks are so specific if we 
compare them to market and credit risks. In fact, asset and liability management has two 
components. The first component is well-identified and corresponds to the risk measurement 
of ALM operations. The second component is much more vague, because it concerns both 
risk management and business development. Indeed, banking business is mainly a financial 
intermediation business, since banks typically tend to borrow short term and lend long 
term. The mismatch between assets and liabilities is then inherent to banking activities. 
Similarly, the balance sheet of a bank and its income statement are highly related, implying 
that future income may be explained by the current balance sheet. The debate on whether 
the ALM department is a profit center summarizes this duality between risk and business 
management. 


Liquidity risk was the subject of the previous chapter. However, we have discussed this topic from a 
risk management point of view by focusing on the regulatory ratios (LCR and NSFR). In this chapter, we 
tackle the issue of liquidity risk from an ALM perspective. 
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7.1.1 Definition 
7.1.1.1 Balance sheet and income statement 


The ALM core function is to measure the asset liability mismatch of the balance sheet 
of the bank. In Table 7.1, we report the 2018 balance sheet of FDIC-insured commercial 
banks and savings institutions as provided by FDIC (2019). It concerns 5406 financial 
institutions in the US. We notice that the total assets and liabilities are equal to $17.9 
tn. The most important items are loans and leases, investment securities and cash & due 
from depository institutions on the asset side, deposits and equity capital on the liability 


TABLE 7.1: Assets and liabilities of FDIC-insured commercial banks and savings institu- 
tions (Amounts in $ bn) 


Total Assets Total liabilities and capital 
Loans secured by real estate Deposits 

1-4 Family residential mortgages Foreign office deposits 
Nonfarm nonresidential Domestic office deposits 
Construction and development Interest-bearing deposits 
Home equity lines Noninterest-bearing deposits 
Multifamily residential real estate Estimated insured deposits 
Farmland Time deposits 
Real estate loans in foreign offices Brokered deposits 


Commercial & industrial loans 
Loans to individuals 

Credit cards 

Other loans to individuals 

Auto loans 

Farm loans 
Loans to depository institutions 
Loans to foreign gov. & official inst. 
Obligations of states in the U.S. 


Federal funds purchased & repos 
FHLB advances 
Other borrowed money 
Subordinated debt 
Trading account liabilities 
Other liabilities 
Total liabilities 
Total equity capital 
Total bank equity capital 


Other loans Perpetual preferred stock 
Lease financing receivables Common stock 


Gross total loans and leases Surplus 


Less: Unearned income 
Total loans and leases 
Less: Reserve for losses 
Net loans and leases 
Securities 
Available for sale (fair value) 
Held to maturity (amortized cost) 
U.S. Treasury securities 
Mortgage-backed securities 
State and municipal securities 
Equity securities 
Cash & due from depos. instit. 
Fed. funds sold and reverse repos 
Bank premises and fixed assets 
Other real estate owned 
Trading account assets 
Intangible assets 
Goodwill 
Other Assets 


Undivided profits 
Other comprehensive income 
Net unrealized P&L on AFS 


Source: Federal Deposit Insurance Corporation (2019), www.fdic.gov/bank/analytical/qbp. 
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side. Table 7.2 shows a simplified version of the balance sheet. The bank collects retail and 
corporate deposits and lends money to households and firms. 


TABLE 7.2: A simplified balance sheet 


Assets Liabilities 
Cash Due to central banks 
Loans and leases Deposits 
Mortgages Deposit accounts 
Consumer credit Savings 
Credit cards Term deposits 
Interbank loans Interbank funding 
Investment securities | Short-term debt 
Sovereign bonds Subordinated debt 
Corporate bonds Reserves 
Other assets Equity capital 


Some deposits have a fixed maturity (e.g. a certificate of deposit), while others have an 
undefined maturity. This is for example the case of demand deposits or current accounts. 
These liabilities are then called non-maturity deposits (NMD), and include transaction 
deposits, NOW (negotiable order of withdrawal) accounts, money market deposit accounts 
and savings deposits. Term deposits (also known as time deposits or certificates of deposit) 
are deposits with a fixed maturity, implying that the customer cannot withdraw his funds 
before the term ends. Generally, the bank considers that the core deposits correspond to 
deposits of the retail customers and are a stable source of its funding. On the asset side, 
the bank proposes credit, loans and leases, and holds securities and other assets such as 
real estate, intangible assets” and goodwill. In Chapter 3 on page 125, we have seen that 
loans concern both individuals, corporates and sovereigns. We generally distinguish loans 
secured by real estate, consumer loans, commercial and industrial loans. Leases correspond 
to contract agreements, where the bank purchases the asset on behalf of the customer, and 
the customer uses the asset in return and pays to the bank a periodic lease payment for 
the duration of the agreement*. Investment securities include repos, sovereign bonds, asset- 
backed securities, debt instruments and equity securities. We reiterate that the balance sheet 
does not concern off-balance sheet items. Indeed, the risk of credit lines (e.g. commitments, 
standby facilities or letters of credit) is measured by the credit risk, while derivatives 
(swaps, forwards, futures and options) are mainly managed within the market risk and the 
counterparty credit risk. 


Another difference between assets and liabilities is that they are not ‘priced’ at the 
same interest rate since the primary business of the bank is to capture the interest rate 
spread between its assets and its liabilities. The bank receives income from the loans and 
its investment portfolio, whereas the expenses of the bank concern the interest it pays 
on deposits and its debt, and the staff and operating costs. In Table 7.3, we report the 
2018 income statement of FDIC-insured commercial banks and savings institutions. We 
can simplify the computation of this income statement and obtain the simplified version 


Intangible assets are non-physical assets that have a multi-period useful life such as servicing rights or 
customer lists. They are also intellectual assets (patents, copyrights, softwares, etc). 

3 Goodwill is the excess of the purchase price over the fair market value of the net assets acquired. The 
difference can be explained because of the brand name, good customer relations, etc. 

4 At the end of the contract, the customer may have the option to buy the asset. 

5In this case, the difficult task is to estimate the exposure at default and the corresponding CCF param- 
eter. 
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TABLE 7.3: Annual income and expense of FDIC-insured commercial banks and savings 
institutions (Amounts in $ mn) 


Total interest income 
Domestic office loans 
Foreign office loans 
Lease financing receivables 
Balances due from depository institutions 
Securities 
Trading accounts 
Federal funds sold 
Other interest income 
Total interest expense 
Domestic office deposits 
Foreign office deposits 
Federal funds purchased 
Trading liabilities and other borrowed money 
Subordinated notes and debentures 
Net interest income 
Provision for loan and lease losses 
Total noninterest income 
Fiduciary activities 
Service charges on deposit accounts 
Trading account gains and fees 
Interest rate exposures 
Foreign exchange exposures 
Equity security and index exposures 
Commodity and other exposures 
Credit exposures 
Investment banking, advisory, brokerage 
and underwriting fees and commissions 
Venture capital revenue 
Net servicing fees 
Net securitization income 
Insurance commission fees and income 
Net gains (losses) on sales of loans 
Net gains (losses) on sales of other real estate owned 
Net gains (losses) on sales of other assets (except securities) 
Other noninterest income 
Total noninterest expense 
Salaries and employee benefits 
Premises and equipment expense 
Other noninterest expense 
Amortization expense and goodwill impairment losses 
Securities gains (losses) 
Income (loss) before income taxes and extraordinary items 
Applicable income taxes 
Extraordinary gains (losses), net 
Net charge-offs 
Cash dividends 
Retained earnings 
Net operating income 


Source: Federal Deposit Insurance Corporation (2019), www.fdic.gov/bank/analytical/qbp. 
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given in Table 7.4. Net interest income corresponds to the income coming from interest 
rates, whereas non-interest income is mainly generated by service fees and commissions. 
The income statement depends of course on the balance sheet items, but also on off-balance 
sheet items. Generally, loans, leases and investment securities are called the earning assets, 
whereas deposits are known as interest bearing liabilities. 


TABLE 7.4: A simplified income statement 


Interest income 
— Interest expenses 
= Net interest income 
+ Non-interest income 
= Gross income 
— Operating expenses 
= Net income 
— Provisions 
= Earnings before tax 
— Income tax 
= Profit after tax 


7.1.1.2 Accounting standards 


We understand that the goal of ALM is to control the risk of the balance sheet in order to 
manage and secure the future income of the bank. However, the ALM policy is constrained 
by accounting standards since the bank must comply with some important rules that dis- 
tinguish banking and trading books. Accounting systems differ from one country to another 
country, but we generally distinguish four main systems: US GAAP®, Japanese combined 
system’, Chinese accounting standards and International Financial Reporting Standards 
(or IFRS). IFRS are standards issued by the IFRS Foundation and the International Ac- 
counting Standards Board (IASB) to provide a global accounting system for business affairs 
and capital markets. In March 2019, there were 144 jurisdictions that required the use of 
IFRS Standards for publicly listed companies and 12 jurisdictions that permitted its use. 
IFRS is then the world’s most widely used framework. For example, it is implemented in 
European Union, Australia, Middle East, Russia, South Africa, etc. Since January 2018, 
IFRS 9 has replaced IAS 39 that was considered excessively complicated and inappropriate. 


Financial instruments IAS 39 required financial assets to be classified in the four fol- 
lowing categories: 


e financial assets at fair value through profit and loss (FVTPL); 


available-for-sale financial assets (AFS); 


loans and receivables (L&R); 


e held-to-maturity investments (HTM). 


6GAAP stands for Generally Accepted Accounting Principles. 

7™Companies may choose one of the four accepted financial reporting frameworks: Japanese GAAP (which 
is the most widespread system), IFRS standards, Japan’s modified international standards (JMIS) and US 
GAAP. 
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The FVTPL category had two subcategories. The first category (designated) included any 
financial asset that was designated on initial recognition as one to be measured at fair 
value with fair value changes in profit and loss. The second category (held-for-trading or 
HFT) included financial assets that were held for trading. Depending on the category, the 
bank measured the financial asset using the fair value approach® (AFS and FVTPL) or the 
amortized cost approach (L&R and HTM). In IFRS 9, the financial assets are divided into 
two categories: 


e amortized cost (AC); 
e fair value (FV). 


For FV assets, we distinguish fair value through profit and loss (FVTPL) and fair value 
through other comprehensive income (FVOCI). Category changes between AC, FVTPL 
and FVOCTI are recognized when the asset is derecognized or reclassified. In fact, the clas- 
sification of an asset depends on two tests: the business model (BM) test and the solely 
payments of principal and interest (SPPI) test. In the BM test, the question is to know “if 
the objective of the bank is to hold the financial asset to collect the contractual cash flows” 
or not. In the SPPI test, the question is rather to understand if “the contractual terms of the 
financial asset give rise on specified dates to cash flows that are solely payments of principal 
and interest on the principal amount outstanding”. It is obvious that the classification of 
an asset affects the ALM policy because it impacts differently the income statement. 


On the liability side, there is little difference between IAS 39 and IFRS 9. All equity 
investments are measured at fair value, HFT financial liabilities are measured at FVTPL 
and all other financial liabilities are measured at amortized cost if the fair value option is 
applied. 


Remark 69 The main revision of IFRS 9 concerns impairment of financial assets since 
it establishes new models of expected credit loss for receivables and loans. This implies that 
banks can calculate loss provisioning as soon as the loan is entered the banking book. 


Hedging instruments Hedge accounting is an option and not an obligation. It considers 
that some financial assets are not held for generating P&L, but are used in order to offset 
a given risk. This implies that the hedging instrument is fully related to the hedged item. 
IAS 39 and IFRS 9 recognize three hedging strategies: 


e a fair value hedge (FVH) is a hedge of the exposure to changes in fair value of a 
recognized asset or liability; 


e a cash flow hedge (CFH) is a hedge of the exposure to variability in cash flows that 
is attributable to a particular risk; 


e a net investment hedge (NIH) concerns currency risk hedging. 


In the case of FVH, fair value of both the hedging instrument and the hedged item are 
recognized in profit and loss. In the case of CFH or NIH, the effective portion of the gain 
or loss on the hedging instrument is recognized in equity (other comprehensive income? or 
OCI), while the ineffective portion of the gain or loss on the hedging instrument is recognized 
in profit and loss. 


8In the AFS case, gains and losses impact the equity capital and then the balance sheet, whereas gains 
and losses of FVTPL assets directly concerns the income statement. 
°See Table 7.1 on page 370. 
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7.1.1.3 Role and importance of the ALCO 


Remark 69 shows that IFRS 9 participates to the convergence of risk, finance and ac- 
counting that we recently observe. In fact, ALM is at the junction of these three concepts. 
This is why we could discuss how to organize the ALM function. Traditionally, it is located 
in the finance department because the ALM committee (ALCO) is in charge of both risk 
management and income management. In particular, it must define the funds transfer pric- 
ing (FTP) policy. Indeed, resources concerning interest and liquidity risks are transferred 
from business lines to the ALM portfolio. The ALCO and the ALM unit is in charge to 
manage the risks of this portfolio, and allocate the P&L across business lines: 


“A major purpose of internal prices is to determine the P&L of the business 
lines. Transfer prices are internal prices of funds charged to business units or 
compensating cheap resources such as deposits. [...] Transfer pricing systems 
are notably designed for the banking book, for compensating resources collected 
from depositors and for charging funds used for lending. Internal prices also 
serve for exchanging funds between units with deficits of funds and units with 
excesses of funds. As they are used for calculating the P&L of a business line, 
they perform income allocation across business lines” (Bessis, 2015, pages 109- 
110). 


1 a 
v i 
Funding Funding 
Price Cost 
Business \<4------- 4------- Business 
Line A — — Line B 
Funding Funding 
Excess Deficit 


FIGURE 7.1: Internal and external funding transfer 


This means that business lines with a funding excess will provide the liquidity to business 
lines with a funding deficit. For example, Figure 7.1 shows the relationships between the 
ALM unit and two business lines A and B. In this case, the business line A must be rewarded 
and receives the funding price, whereas the business B pays the funding cost. Internal funds 
transfer system avoids that business lines A and B directly go to the market. However, 
the ALM unit has access to the market for both lending the funding liquidity excess or 
borrowing the funding liquidity deficit of the bank. At first sight, we can assume that the 
internal funding price is equal to the external funding price and the internal funding cost is 
equal to the external funding cost. In this case, the ALM unit captures the bid/ask spread 
of the funding liquidity. In the real life, it is not possible and it is not necessarily desirable. 
Indeed, we reiterate that the goal of a bank is to perform liquidity transformation. This 
means that the liquidity excess of the business line A does not match necessarily the liquidity 
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deficit of the business line B. Second, the role of the ALM is also to be sure that business 
lines can pilot their commercial development. In this situation, it is important that internal 
funding prices and costs are less volatile than external funding prices and costs in order to 
better stabilize commercial margins. Since the funds transfer pricing policy is decided by 
the ALCO, we notice that the role of ALM cannot be reduced to a risk management issue. 
Even if the risk transfer is intentionally rational and fair, meaning that internal prices are 
related to market prices, the ALM remains a business issue because the assets and liabilities 
are generally not tradable and there are not always real market prices for these items. For 
example, what is the price of a $100 deposit? It depends on the behavior of the customer, 
but also on the risk appetite of the bank. What is the margin of a $100 loan? It is not 
the spread between the loan interest rate and the market interest rate, because there is no 
perfect matching between the two interest rates. In this case, the margin will depend on 
the risk management policy. This duality between income generation and risk management 
is the specificity of asset liability management. Therefore, the role of the ALCO is essential 
for a bank, because it impacts the risk management of its balance sheet, but also the income 
generated by its banking book. 


7.1.2 Liquidity risk 


In this section, we define the concept of liquidity gap, which is the main tool for mea- 
suring the ALM liquidity risk. In particular, we make the distinction between static and 
dynamic liquidity gap when we consider the new production and future projections. In order 
to calculate liquidity gaps, we also need to understand asset and liability amortization, and 
liquidity cash flow schedules. Finally, we present liquidity hedging tools, more precisely the 
standard instruments for managing the ALM liquidity risk. 


7.1.2.1 Definition of the liquidity gap 


Basel III uses two liquidity ratios (LCR and NSFR), which are related to the ALM 
liquidity risk. More generally, financial institutions (banks, insurance companies, pension 
funds and asset managers) manage funding risks by considering funding ratios or funding 
gaps. The general expression of a funding ratio is: 


FR =Z (7.1) 


where A(t) is the value of assets and L(t) is the value of liabilities at time t, while the 
funding gap is defined as the difference between asset value and liability value: 


FG (t) = A(t) — L(t) (7.2) 


If FR(t) > 1 or FG (t) > 0, the financial institution does not need funding because the 
selling of the assets covers the repayment of the liabilities. Equations (7.1) and (7.2) corre- 
spond to the bankruptcy or the liquidation point of view: if we stop the activity, are there 
enough assets to meet the liability requirements of the financial institution? Another point 
of view is to consider that the case A(t) > L(t) requires financing the gap A (t) — L (t), 
implying that the financial institution has to raise liability funding to match the assets. 


From that point of view, Equations (7.1) and (7.2) becomes'”: 
L(t) 
LR (t) = 7.3 
O= i (7.3) 


10We use the letter £ (liquidity) instead of F (funding) in order to make the difference between the two 
definitions. 
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and: 
LG (t) = L(t) — A(t) (7.4) 


In what follows, we consider the liquidity gap £G(t) instead of the funding gap FG (t), 
meaning that a positive (resp. negative) gap corresponds to a liquidity excess (resp. liquidity 
deficit). 


Example 66 We consider a simplified balance sheet with few items. The assets A(t) are 
composed of loans that are linearly amortized in a monthly basis during the next year. Their 
values are equal to 120. The liabilities L(t) are composed of three short-term in fine debt 
instruments, and the capital. The corresponding debt notional is respectively equal to 65, 10 
and 5 whereas the associated remaining maturity is equal to two, seven and twelve months. 
The amount of capital is stable for the next twelve months and is equal to 40. 


In Table 7.5, we have reported the asset and liability values A(t) and L(t). Since the 
loans are linearly amortized in a monthly basis, A(t) is equal to 110 after one month, 100 
after two months, etc. The value of the first debt instrument remains 65 for the first and 
second months, and is then equal to zero because the maturity has expired. It follows that 
the value of the total debt is a piecewise constant function. It is equal to 80 until two months, 
15 between three and seven months and 5 after. We can then calculate the liquidity gap. At 
the initial date, it is equal to zero by definition. At time t = 1, we deduce that LG (1) = +10 
because we have A(1) = 110 and L(1) = 120. 


TABLE 7.5: Computation of the liquidity gap 


Period 0 1 2 3 4 5 6 7 8 8 10 Ti 12 
Loans 120 110 100 90 80 70 60 50 40 30 20 10 0 
Assets 120 110 100 90 80 70 60 50 40 30 20 10 0 
Debt #1 65 65 65 

Debt #2 10 10 10 10 10 10 10 10 

Debt #3 5 5 5 5 5 5 5 5 5 5 5 5 5 
Debt (total) 80 80 80 15 15 15 15 15 5 5 5 5 5 
Equity 40 40 40 40 40 40 40 40 40 40 40 40 40 
Liabilities 120 120 120 55 55 55 55 55 45 45 45 45 45 
LG (t) 0 10 20 -35 -25 -15 -5 5 5 15 25 35 45 


The time profile of the liquidity gap is given in Figure 7.2. We notice that it is positive 
at the beginning, implying that the bank has an excess of liquidity funding in the short-run. 
Then, we observe that the liquidity gap is negative and the bank needs liquidity funding. 
From the seventh month, the liquidity gap becomes again positive. At the end, the liquidity 
gap is always positive since assets and liabilities are fully amortized, implying that the 
balance sheet is only composed of the capital. 


7.1.2.2 Asset and liability amortization 


In order to calculate liquidity gaps, we need to understand the amortization of assets 
and liabilities, in particular the amortization of loans, mortgages, bonds and other debt 
instruments. The general rules applied to debt payment are the following: 


e The annuity amount A(t) at time t is composed of the interest payment J (t) and the 
principal payment P (t): 
A(t)=I(t)+ P(t) 
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FIGURE 7.2: An example of liquidity gap 


This implies that the principal payment at time t is equal to the annuity A(t) minus 
the interest payment I (t): 


P(t) =A(t)—TI(t) 
It corresponds to the principal or the capital which is amortized at time t. 


e The interest payment at time t is equal to the interest rate i(t) times the outstand- 
ing principal balance (or the remaining principal) at the end of the previous period 
N (t— 1): 

I(t) =i(t()N(t-1) 


e The outstanding principal balance N (t) is the remaining amount due. It is equal 
to the previous outstanding principal balance N (t — 1) minus the principal payment 
P(t): 

N(t)=N(t—1)- P(t) (7.5) 


At the initial date t = 0, the outstanding principal balance is equal to the notional 
of the debt instrument. At the maturity t = n, we must verify that the remaining 
amount due is equal to zero. 


e The outstanding principal balance N (t) is equal to the present value C (t) of forward 
annuity amounts: 


N (t) =C(t) 


We can distinguish different types of debt instruments. For instance, we can assume that the 
capital is linearly amortized meaning that the principal payment P (t) is constant over time 
(constant amortization debt). We can also assume that the annuity amount A (t) is constant 
during the life of the debt instrument (constant payment debt). In this case, the principal 
payment P (t) is an increasing function with respect to the time t. Another amortization 
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scheme corresponds to the case where the notional is fully repaid at the time of maturity 
(bullet repayment debt). This is for example the case of a zero-coupon bond. 


Let us consider the case where the interest rate i(t) is constant. For the constant amor- 


tization debt, we have: 


1 
P(t) = -No 
n 


where n is the number of periods and No is the notional of the mortgage. The cumulative 
principal payment Q (t) is equal to: 


Q(t) = P(s) == Mo 


s<t 
We deduce that the outstanding principal balance N (t) verifies: 
t 
N()=No-Q() = (1-4) No 


We also have I (t) = iC (t — 1) where C (t — 1) = N (t — 1) and: 


A(t) =I(t) + P(t) = (4+i(1-—*)) m 


In Exercise 7.4.1 on page 449, we derive the formulas of the constant payment debt. The 
constant annuity is equal to: 


T(t) 
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and the principal payment: 
1 


— A 
(1 4 ie 
Moreover, we show that the outstanding principal balance N (t) verifies: 


vo- (Heat) 


l 


P(t)= 


Finally, in the case of the bullet repayment debt, we have J (t) = iNo, P(t) = 1 {t = n} -No, 
A(t)=I(t)+ P(t) and N(t)=1{t#n}- No. 


Example 67 We consider a 10-year mortgage, whose notional is equal to $100. The annual 
interest rate i is equal to 5%, and we assume annual principal payments. 


Results are given in Tables 7.6, 7.7 and 7.8. For each payment structure, we have reported 
the value of the remaining capital C (t — 1) at the beginning of the period, the annuity paid 
at time t, the split between the interest payment J (t) and the principal payment P (t), the 
cumulative principal payment Q (t). When calculating liquidity gaps, the most important 
quantity is the outstanding principal balance N (t) given in the last column, because it 
corresponds to the amortization of the debt. 
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TABLE 7.6: Repayment schedule of the constant amortization mortgage 


t: Ct- 40 10 PO OH Nw 
1 100.00 15.00 5.00 10.00 10.00 90.00 
2 90.00 14.50 4.50 10.00 20.00 80.00 
3 80.00 14.00 4.00 10.00 30.00 70.00 
4 70.00 13.50 3.50 10.00 40.00 60.00 
5 60.00 13.00 3.00 10.00 50.00 50.00 
6 50.00 12.50 2.50 10.00 60.00 40.00 
7 
8 
9 
0 


40.00 12.00 2.00 10.00 70.00 30.00 
30.00 11.50 1.50 10.00 80.00 20.00 
20.00 11.00 1.00 10.00 90.00 10.00 
10.00 10.50 0.50 10.00 100.00 0.00 


TABLE 7.7: Repayment schedule of the constant payment mortgage 


+ C@-l)) 40 10 PO QO Nw 
1 100.00 12.95 5.00 7.95 7.95 92.05 
2 92.05 12.95 4.60 8.35 16.30 83.70 
3 83.70 12.95 4.19 8.77 25.06 74.94 
4 74.94 12.95 3.75 9.20 34.27 65.73 
5 65.73 12.95 3.29 9.66 43.93 56.07 
6 56.07 12.95 2.80 10.15 54.08 45.92 
1 
8 
9 
0 


45.92 12.95 2.30 10.65 64.73 35.27 
35.27 12.95 1.76 11.19 75.92 24.08 
24.08 12.95 1.20 11.75 87.67 12.33 
12.33 12.95 0.62 12.33 100.00 0.00 


jai 


TABLE 7.8: Repayment schedule of the bullet repayment mortgage 


C@-1) AW I) PHA QO NÆ 
100.00 5.00 5.00 0.00 0.00 100.00 
100.00 5.00 5.00 0.00 0.00 100.00 
100.00 5.00 5.00 0.00 0.00 100.00 
100.00 5.00 5.00 0.00 0.00 100.00 
100.00 5.00 5.00 0.00 0.00 100.00 
100.00 5.00 5.00 0.00 0.00 100.00 
100.00 5.00 5.00 0.00 0.00 100.00 
100.00 5.00 5.00 0.00 0.00 100.00 
100.00 5.00 5.00 0.00 0.00 100.00 
100.00 105.00 5.00 100.00 100.00 0.00 
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Previously, we have assumed that the payment type is annual, but we can consider other 
periods for the amortization schedule. The most common frequencies are monthly, quarterly, 
semi-annually and annually'!. Let i be the annual interest rate and p the frequency or the 
number of compounding periods per year. The consistency principle of the accumulation 
factor implies the following identity: 


arg- (12) 


where i ) is the nominal interest rate expressed in a yearly basis. For example, if the 
nominal interest rate i™°"*hly) ig equal to 12%, the borrower pays a monthly interest rate 
of 1%, which corresponds to an annual interest rate of 12.6825%. 


Remark 70 The interest rate i is also called the annual equivalent rate (AER) or the 
effective annual rate (EAR). 


Example 68 We consider a 30-year mortgage, whose notional is equal to $100. The annual 
interest rate i is equal to 5%, and we assume monthly principal payments. 


This example is a variant of the previous example, since the maturity is higher and 
equal to 30 years, and the payment schedule is monthly. This implies that the number n 
of periods is equal to 360 months and the monthly interest rate is equal to 5%/12 or 41.7 
bps. In Figure 7.3, we show the amortization schedule of the mortgage for the three cases: 
constant (or linear!*) amortization, constant payment or annuity and bullet repayment. We 
notice that the constant annuity case is located between the constant amortization and the 
bullet repayment. We have also reported the constant annuity case when the interest rate 
is equal to 10%. We notice that we obtain the following ordering: 


41 > i2 > N(t|t1) > N (t| i2) 


where N (t) (i) is the outstanding principal balance given the interest rate i. In fact, constant 
annuity and constant amortization coincide when the interest rate goes to zero whereas 
constant annuity and bullet repayment coincide when the interest rate goes to infinity. 


Example 69 We consider the following simplified balance sheet: 


Assets Liabilities 
Items Notional Rate Mat. Items Notional Rate Mat. 
Loan #1 100 5% 10 Debt #1 120 5% 10 
Loan #2 50 8% 16 Debt #2 80 3% 5 
Loan #3 40 3% 8 Debt #3 70 4% 10 
Loan #4 110 2% 7 | Capital #4 30 


The balance sheet is composed of four asset items and four liability items. Asset items 
correspond to different loans, whose remaining maturity is respectively equal to 10, 16, 8 
and 7 years. Liabilities contain three debt instruments and the capital, which is not amortized 
by definition. All the debt instruments are subject to monthly principal payments. 


In Figure 7.4, we have calculated the liquidity gap for different amortization schedule: 
constant payment, constant annuity and bullet repayment at maturity. We notice that 
constant payment and constant annuity give similar amortization schedule. This is not the 


11 Monthly is certainly the most used frequency for debt instruments. 
12 The two terms constant and linear can be used interchangeably. 
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FIGURE 7.3: Amortization schedule of the 30-year mortgage 
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FIGURE 7.4: Impact of the amortization schedule on the liquidity gap 


Asset Liability Management Risk 383 


TABLE 7.9: Computation of the liquidity gap (mixed schedule) 


l Assets l Liabilities l 


tip 42 B pM aAA PBM L | 
1, 99.4 49.9 39.6 110 298.8, 119.2 78.7 70 30 297.9, —0.92 
2! 98.7 49.7 39.2 110 297.6 !118.5 77.3 70 30 295.8! —1.83 
3, 98.1 49.6 38.8 110 296.4, 117.7 76.0 70 30 293.7, —2.75 
4! 97.4 49.5 38.3 110 295.2 !116.9 74.7 70 30 291.6! —3.66 
5, 96.8 49.3 37.9 110 294.0 116.1 73.3 70 30 289.4, —4.58 
6! 96.1 49.2 37.5 110 292.8 1115.3 72.0 70 30 287.3! —5.49 
7, 95.4 49.1 37.1 110 291.6 | 114.5 70.7 70 30 285.2; —6.41 
81 94.8 48.9 36.7 110 290.4 1113.7 69.3 70 30 283.11 —7.32 
9; 94.1 48.8 36.3 110 289.2 | 112.9 68.0 70 30 280.9, —8.24 
101 93.4 48.7 35.8 110 287.91112.1 66.7 70 30 278.81 —9.15 
11 | 92.8 48.5 35.4 110 286.7 |111.3 65.3 70 30 276.7 | —10.06 
121 92.1 48.4 35.0 110 285.51110.5 64.0 70 30 274.5 ı —10.97 
0; 100.0 50.0 40.0 110 300.0 | 120.0 80.0 70 30 300.0; 0.00 
11 92.1 48.4 35.0 110 285.5 1110.5 64.0 70 30 274.5 ı —10.97 
2} 83.8 46.7 30.0 110 270.4 | 100.5 48.0 70 30 248.5 | —21.90 
31 75.0 44.8 25.0 110 254.81 90.1 32.0 70 30 222.1 1 —32.76 
4! 65.9 42.7 20.0 110 238.6! 79.0 16.0 70 30 195.0 | —43.55 
51 56.2 40.5 15.0 110 221.71 67.4 70 30 167.41 —54.27 
6! 46.1 381 10.0 110 204.2! 55.3 70 30 155.3 | —48.91 
71 35.4 35.5 5.0 75.9 1 42.5 70 30 142.51 66.56 
8! 24.2 32.7 56.9 | 29.0 70 30 129.0! 72.12 
9, 12.4 29.7 42.1, 14.9 70 30 1149; 72.81 
10! 26.4 26.4 | 30 30.0! 3.62 
11, 22.8 22.8 | 30 30.0, 7.19 
12 | 18.9 18.9 | 30 30.0! 11.06 
13 | 14.8 14.8 | 30 30.0, 15.24 
14 ! 10.2 10.2 | 30 30.0! 19.77 
15 | 5.3 5.3 | 30 30.0; 24.68 
16 | 0.0 | 30 30.0! 30.00 


case of bullet repayment. In the fourth panel, we consider a more realistic situation where 
we have both constant principal (loan #3 and debt #2), constant annuity (loan #1, loan 
#2 and debt #1) and bullet repayment (loan #4 and debt #2). Computation details for 
this last mixed schedule are given in Table 7.9. The top panel presents the liquidity gap 
LG (t) of the first twelve months while the bottom panel corresponds to the annual schedule. 
The top panel is very important since it corresponds to the first year, which is the standard 
horizon used by the ALCO for measuring liquidity requirements. We see that the bank will 
face a liquidity deficit during the first year. 


The previous analysis does not take into account two important phenomena. The first 
one concerns customer behaviorial options such as prepayment decisions. We note N° (t) 
the conventional outstanding principal balance that takes into account the prepayment risk. 
We have: 

N° (t) = N(t)-1{7 >t} 


where N (t) is the theoretical outstanding principal balance and 7 is the prepayment time 
of the debt instrument. The prepayment time in ALM modeling is equivalent to the survival 
or default time that we have seen in credit risk modeling. Then 7 is a random variable, 
which is described by its survival function S (t). Let p(t) be the probability that the debt 
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instrument has not been repaid at time t. We have: 
p(t) =E[t {7 > t}] =S8 (é) 


By construction, N° (t) is also random. Therefore, we can calculate its mathematical ex- 
pectation, and we have N° (t) = E[N" (t)] = p(t)- N (t) For example, if we assume that 
T ~ E (A) where ) is the prepayment intensity, we obtain N° (t) = e~**- N (t). By definition, 
we always have N° (t) < N (t) and N° (t) < N(t). 

In Figure 7.5, we consider the constant payment mortgage given in Example 68 on page 
381. The first panel shows the theoretical or contractual outstanding principal balance. In 
the second and third panels, we consider that there is a prepayment at time 7 = 10 and 
T = 20. This conventional schedule coincides with the contractual schedule, but is equal 
to zero once the prepayment time occurs. Finally, the fourth panel presents the conven- 
tional amortization schedule N° (t) when the prepayment time is exponentially distributed. 
When A is equal to zero, we retrieve the previous contractual schedule N (t). Otherwise, 
the mortgage amortization is quicker. 
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FIGURE 7.5: Conventional amortization schedule with prepayment risk 


The second important phenomenon that impacts amortization schedule is the new pro- 
duction of assets and liabilities. If we consider a balance sheet item, its outstanding amount 
at time t is equal to the outstanding amount at time t — 1 minus the amortization between 
t and t — 1 plus the new production at time t: 


N (t) = N (t — 1) — AM (t) + NP (t) (7.6) 


This relationship is illustrated in Figure 7.6 and can be considered as an accounting identity 
(Demey et al., 2003). In the case where there is no prepayment, the amortization AM (t) 
is exactly equal to the principal payment P (t) and we retrieve Equation (7.5) except the 
term NP (t). However, there is a big difference between Equations (7.6) and (7.5). The first 
one describes the amortization of a debt instrument, for example a loan or a mortgage. The 
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FIGURE 7.6: Impact of the new production on the outstanding amount 


Source: Demey et al. (2003). 


second one describes the amortization of a balance sheet item, that is the aggregation of 
several debt instruments. The new production NP (t) corresponds to the financial transac- 
tions that appear in the balance sheet between t and t — 1. They concern the new credit 
lines, customer loans, mortgages, deposits, etc. that have been traded by the bank during 
the last period [t — 1,t]. The introduction of the new production leads to the concept of 
dynamic liquidity gap, in contrast to the static liquidity gap. 


Remark 71 As we will see in the next section, dynamic liquidity analysis is then more 
complex since the function NP (t) is not always known and depends on many parameters. 
Said differently, NP (t) is more a random variable. However, it is more convenient to treat 
NP (t) as a deterministic function than a stochastic function in order to obtain closed-form 
formula and not to use Monte Carlo methods?’ . 


7.1.2.3 Dynamic analysis 
According to BCBS (2016d) and EBA (2018a), we must distinguish three types of anal- 
ysis: 


e Run-off balance sheet 
A balance sheet where existing non-trading book positions amortize and are not re- 
placed by any new business. 


13 Equation (7.6) can also be written as follows: 


NP (t) = N (t) — (N (t — 1) — AM (t)) 


Written in this form, this equation indicates how to calculate the new production. In particular, this rela- 
tionship can be used to define an estimator of NP (t). 
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e Constant balance sheet 
A balance sheet in which the total size and composition are maintained by replacing 
maturing or repricing cash flows with new cash flows that have identical features. 


e Dynamic balance sheet 
A balance sheet incorporating future business expectations, adjusted for the relevant 
scenario in a consistent manner. 


The run-off balance sheet analysis has been exposed in the previous section. The constant 
or dynamic balance sheet analysis assumes that we include the new production when cal- 
culating the liquidity gap. For the constant analysis, this task is relatively easy since we 
consider a like-for-like replacement of assets and liabilities. The dynamic analysis is more 
difficult to implement because it highly depends “on key variables and assumptions that 
are extremely difficult to project with accuracy over an extended period and can potentially 
hide certain key underlying risk exposures” (BCBS, 2016d, page 8). 


Stock-flow analysis According to Demey et al. (2003), the non-static analysis requires 
a mathematical framework in order to distinguish stock and flow streams. We follow these 
authors, and more particularly we present the tools introduced in Chapter 1 of their book. 
We note NP (t) the new production at time t and NP (t,u) the part of this production'4 
that is always reported in the balance sheet at time u > t. The amortization function S (t, u) 
is defined by the following equation: 


NP (t,u) = NP (t) -S (t,u) 


The amortization function is in fact a survival function, implying that the following prop- 
erties hold: S (t,t) = 1, S(t,oo) = 0 and S(t,u) is a decreasing function with respect to 
u. The amortization function is homogeneous if we have S (t,u) = S (u — t) for all u > t. 
Otherwise, amortization function is non-homogeneous and may depend on the information 
Tiu between t and u. In this case, we can write S (t, u) = S (t, u; Tt:u) where Tt:u may con- 
tain the trajectory of interest rates, the history of prepayment times, etc. We define the 
amortization rate as the hazard rate associated to the survival function S (t, u): 


In management, we generally make the distinction between stock and flow streams, 
but we know that the stock at time t is the sum of past flows. In the case of ALM, the 
outstanding amount plays the role of stock while the new production corresponds to a flow. 
Therefore, the outstanding amount at time t is the sum of past productions that are always 
present in the balance sheet at time t: 


If follows that: 


N(t) = [ xpe-9se-s0) ds 


= f NP (s) S (s, t) ds (7.7) 


— 0o 


l4We have NP (t) = NP (t,t) and NP (t, o0) = 0. 
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TABLE 7.10: Relationship between the new production and the outstanding amount 


s  NP(s) S(s,7) NP(s,7) S(s,10) NP(s,10) S(s,12) NP(s,12) 
1 110 0.301 33.13 0.165 18.18 0.111 12.19 
2 125 0.368 45.98 0.202 25.24 0.135 16.92 
3 95 0.449 42.69 0.247 23.43 0.165 15.70 
4 79 0.549 41.16 0.301 22.59 0.202 15.14 
5 137 0.670 91.83 0.368 50.40 0.247 33.78 
6 125 0.819 102.34 0.449 56.17 0.301 37.65 
7 115 1.000 115.00 0.549 63.11 0.368 42.31 
8 152 0.670 101.89 0.449 68.30 
9 147 0.819 120.35 0.549 80.68 
10 159 1.000 159.00 0.670 106.58 
11 152 0.819 124.45 
12 167 1.000 167.00 
N (t) 472.14 640.36 720.69 


N (t) 
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The outstanding amount N (t) at time t is then the sum of each past production NP (s) 
times its amortization function S (s, t). In Table 7.10, we provide an example of calculating 
the outstanding amount using the previous convolution method. In the second column, 
we report the production of each year s. We assume that the amortization function is 
homogeneous and is an exponential distribution with an intensity À equal to 20%. The 
third and fourth columns give the values of the amortization function and the production 
that is present in the balance sheet at time t = 7. We obtain N (7) = 472.14. The four last 
columns correspond to the cases t = 10 and t = 12. 


Demey et al. (2003) introduce the concept of stock amortization. We recall that the 
amortization function S (t, u) indicates the proportion of $1 entering in the balance sheet at 
time t that remains present at time u > t. Similarly, the stock amortization function S* (t, u) 
measures the proportion of $1 of outstanding amount at time t that remains present at time 
u > t. In order to obtain an analytical and tractable function S* (t, u), we must assume that 
the new production is equal to zero after time t. This corresponds to the run-off balance 
sheet analysis. Demey et al. (2003) show that the non-amortized outstanding amount is 
equal to: A 

N (t, u) =| NP (s)S (s,u) ds 
—co 
where t is the current time and u is the future date. For instance, N (5,10) indicates the 
outstanding amount that is present in the balance sheet at time t = 5 and will remain in 
the balance sheet five years after. It follows that: 


N (t,u) = N (t) - S* (t,u) 
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and we deduce that: 
N (t, u) 
N (t) 
i. NP (s)S(s,u) ds 
JE NP (s) 8(s,t) ds 


S*(t,u) = 


Dynamics of the outstanding amount Using Equation (7.7), we obtain’: 


dN (t i 
A ) = -f NP (s) f (s, t) ds + NP (t) (7.8) 
where f(t,u) = —O,S(t,u) is the density function of the amortization. This is the 


continuous-time version of the amortization schedule given by Equation (7.6): 
N (t)— N (t—1) = —AM(t) + NP (t) 


where: ; 
AM (t) = I NP (s) f (s,t) ds 
—oo 
As already said, we notice the central role of the new production when building a dynamic 
gap analysis. It is obvious that the new production depends on several parameters, for 
example the commercial policy of the bank, the competitive environment, etc. 


Estimation of the dynamic liquidity gap We can then define the dynamic liquidity 
gap at time t for a future date u > t as follows!®: 


LG(t,u) = 5 (Ne f NPR SCs as) - 


k€ Liabilities t 


5 (m (t u) + [ NP; (s) Sx (s, u) as) 


ke Assets t 


where k represents a balance sheet item. This is the difference between the liability outstand- 
ing amount and the asset outstanding amount. For a given item k, the dynamic outstanding 
amount is composed of the outstanding amount Nx (t, u) that will be non-amortized at time 
u plus the new production between ¢t and u that will be in the balance sheet at time u. 
The difficulty is then to estimate the new production and the amortization function. As 
said previously, the new production generally depends on the business strategy of the bank. 


15We have: 


t t 
a(f NP (99560) as) = soseo a+ f NP (s) 25189 as 


= np (ny at f ne (s) (2562) ds 


16Tn the case of the run-off balance sheet, we set NP; (s) = 0 and we obtain the following formula: 


LG (t,u) = 5 Nx (t, u) — ` Ny (t, u) 


kELiabilities kEAssets 
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Concerning the amortization function, we can calibrate S; (t, u) using a sample of new pro- 
ductions if we assume that the amortization function is homogenous: S% (t, u) = Sp (u — t). 


It follows that: S, NP; (t,u) 
x ‘Ck ; (t, u 
Si. (u t) = jEk J 
J jek NP; (t) 


Moreover, we can show that $, (u — t) is a convergent estimator and its asymptotic distri- 
bution is given by: 


Si (u — t) — Sz (u — t) H) t))) 


where H is the Herfindahl index associated to the sample of new productions 


> N (0, H -Sp (u Sx (u 


17, 
Remark 72 This result can be deduced from the empirical estimation theory. Let S(t) be 


the survival function of the survival time T. The empirical survival function of the weighted 
sample {(w;,7;),j =1,...,n} is equal to: 


where Dj; = 1(17; >t) is a Bernoulli random variable with parameter p = S(t). If we 
assume that the sample observations are independent, we deduce that: 


+ 1w 
j= : = 2 —__L_, 
(Ss a ‘(es PE 


Example 70 We consider a sample of five loans that belong to the same balance sheet item. 
Below, we have reported the value taken by NP; (t, u): 


- var ( 


var (8 ©) = -S (t) - (1 -S (¢)) 


u—-t| 0 1 2 3 4 5 6 7 8 9 10 Ii 
#1 |100 90 80 70 60 50 40 30 20 10 5 0 
#2 70 65 55 40 20 10 5 0 

#3 |100 95 85 80 60 40 20 10 0 

#4 50 47 44 40 37 33 27 17 10 7 O 
#5 20 18 16 14 10 8 5 3 0 


In Figure 7.7, we have estimated the amortization function $ (u — t). We have also 
computed the variance of the estimator and reported the 95% confidence interval'®. 


Liquidity duration Another important tool to measure the mismatch between assets and 
liabilities is to calculate the liquidity duration, which is defined as the average time of the 
amortization of the new production NP (t). In a discrete-time analysis, the amortization 
value between two consecutive dates is equal to NP (t,u) — NP (t,u + 1). Therefore, the 
liquidity duration is the weighted average life (WAL) of the principal repayments: 


(NP (t, vu) — NP (t,u+1))-(u-t) 


D (t) = Dat 


17We have H = Yek w? where: 
TAN NP; (t) 
: Der NPy ®©) 


18We have assumed that the sample is composed of 100 loans or 20 copies of the five previous loans. 
Otherwise, the confidence interval is too large because the sample size is small. 


390 Handbook of Financial Risk Management 


1.0 

` \ ===: Lower bound 
0.8 L SN @— Estimate 

: — — Upper bound 

N 
0.6 F 
0.4 F 
0.2 F 
0.0 1 L L 1 pe 

0 2 4 6 8 10 12 


FIGURE 7.7: Estimation of the amortization function Ŝ (u — t) 


Since we have: 


NP (t,u) — NP(t,u +1) = —NP(t)- (S (t,u + 1) —S(t,u)) 


and: 
SO (NP (t, u) — NP (tu +1) = —NP(t) 5+ (S(¢,u+1) — 8(t,u)) 
ust _ NP (t) ust 


we obtain the following formula: 


Co 


D(t)=— S_ (S(t,ut+1) —S(t,u))- (wt) 


ust 


In the continuous-time analysis, the liquidity duration is equal to: 


Dit) = [3 e t) du 


2 [wre au 


where f (t,u) is the density function associated to the survival function S (t, u). 


Remark 73 If we consider the stock approach of the liquidity duration, we have: 


p= f wD e de 


19 Because we have S (t,t) = 1 and S(t,oo) = 0. 
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where f* (t,u) is the density function associated to the survival function S* (t, u): 


Feu = = 


Some examples We consider the three main amortization schemes: bullet repayment, 
constant (or linear) amortization and exponential amortization. In Exercise 7.4.3 on page 
450, we have calculated the survival functions S (t, u) and S* (t, u), the liquidity duration 
D (t) and D* (t) and the outstanding dynamics dN (t) where m is the debt maturity and A 
is the exponential parameter. Their expression is reported below in Table 7.11. 


TABLE 7.11: Amortization function and liquidity duration of the three amortization 


schemes 
_Amortization | _ = Stu) OD 
Bullet 1{t<u<t+m} m 
—t 
Constant l{t<u<t+m}. (1 = ) = 
m 2 
1 
Exponential eA) 5 
Amortization S* (t, u) D* (t) 
wee Wye nn ae u-t) m 
t< -{1 
Bullet 1{t<u<t+m} (: =) 5 
u—t\? m 
Constant L{t<u<t+m}- (1 ) 
m 3 
1 
Exponential a aa F 
_Amortization | _ a _ dN (t) | a Sa 
Bullet dN (t) = (NP (t) — NP (t — m)) dt 
1 t 
Constant dN (t) = (xp (t) — — NP (s) as) dt 
M Jt-m 
Exponential dN (t) = (NP (t) — AN (t)) dt 


We have represented these amortization functions S (t,u) and S* (t,u) in Figure 7.8. 
The maturity m is equal to 10 years and the exponential parameter A is set to 30%. Besides 
the three previous amortization schemes, we also consider the constant payment mortgage 
(CPM), whose survival functions are equal to7?: 

f= e` ilt+m-u) 
S(t,u) =1{t<u<t+m}. = 
and: A 
ttm beers 1 


im+te™m—] 


S* (t,u) =U 


20These expressions are derived in Exercise 7.4.3 on page 450. 
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where 7 is the interest rate and m is the debt maturity. The CPM amortization scheme 
corresponds to the bottom/right panel?! in Figure 7.8. 


Bullet Constant 
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Exponential CPM 


FIGURE 7.8: Amortization functions S (t, u) and S* (t, u) 


Remark 74 We notice the convex profile of the constant and exponential amortization 
schemes, whereas the profile is concave for the CPM amortization scheme. Moreover, when 
the interest rate i goes to zero, the CPM profile corresponds to the constant profile. 


7.1.2.4 Liquidity hedging 


When we face a risk that is not acceptable, we generally hedge it. In the case of the 
liquidity, the concept of hedging is unclear. Indeed, at first sight, it seems that there are 
no liquidity forwards, swaps or options in the market. On the other hand, liquidity hedging 
seems to be trivial. Indeed, the bank can lend to other market participants when having an 
excess of funding, or the bank can borrow when having a deficit of funding. For that, it may 
use the interbank market or the bond market. Nevertheless, there is generally an uncertainty 
about the liquidity gap, because the amortization schedule and the new production are not 
known for sure. This is why banks must generally adopt a conservative approach. For 
instance, they must not lend (or buy bonds) too much. In a similar way, they must not 
borrow too short. The liquidity gap analysis is particularly important in order to split the 
decision between daily, weekly, monthly and quarterly adjustments. Let us assume that the 
bank anticipates a liquidity deficit of $10 mn for the next three months. It can borrow exactly 
$10 mn for three months. One month later, the bank has finally an excess of liquidity. It is 
obvious that the previous lending is not optimal because the bank must pay a three-month 
interest rate while it could have paid a one-month interest rate. 


The previous example shows that the management of the liquidity consists in managing 
interbank and bond operations. It is obvious that the funding program depends on the 


21 The interest rate i is set to 5%. 
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liquidity gap but also on the risk appetite of the bank. Some banks prefer to run a long- 
term liquidity program, others prefer to manage the liquidity on a shorter-term basis. The 
ALCO decisions may have therefore a big impact on the risk profile of the bank. The 2008 
Global Financial Crisis has demonstrated that liquidity management is key during periods 
of stress. For instance, a bank, which has a structural liquidity excess, may stop to lend to 
the other participants in order to keep this liquidity for itself, while a bank, which has a 
structural liquidity need, may issue long-term debt in order to reduce day-to-day funding 
requirements. It is clear that ALCO decisions are beyond the scope of risk management and 
fall within strategic and business issues. 


7.1.3 Interest rate risk in the banking book 


The ALM of interest rate risk is extensively developed in the next section. However, 
we give here the broad lines, notably the regulation framework, which has been elaborated 
by the Basel Committee in April 2016 (BCBS, 2016d) and which is known as IRRBB 
(or interest rate risk in the banking book). IRRBB can be seen as the revision of the 2004 
publication (BCBS, 2004b), but not solely. Indeed, this 2016 publication is relatively precise 
in terms of risk framework and defines a standardized framework, which was not the case in 
2004. In particular, capital requirements are more closely supervised than previously, even 
if IRRBB continues to be part of the Basel capital framework’s Pillar 2. 


7.1.3.1 Introduction on IRRBB 


Definition of IRRBB According to BCBS (2016d), “IRRBB refers to the current or 
prospective risk to the bank’ capital and earnings arising from adverse movements in interest 
rates that affect the bank’s banking book positions. When interest rates change, the present 
value and timing of future cash flows change. This in turn changes the underlying value of a 
bank’s assets, liabilities and off-balance sheet items and hence its economic value. Changes 
in interest rates also affect a bank’s earnings by altering interest rate-sensitive income and 
expenses, affecting its net interest income”. We notice that the Basel Committee considers 
both economic value (EV) and earnings-based risk measures. EV measures reflect changes 
in the net present value of the balance sheet resulting from IRRBB, whereas earnings-based 
measures reflect changes in the expected future profitability of the bank. Since EV measures 
are generally used by supervisors?” and earnings-based measures are more widely used by 
commercial banks?’, the Basel Committee thinks that the bank must manage these two 
risks because they capture two different time horizons. Economic value is calculated over 
the remaining life of debt instruments, implying a run-off balance sheet assumption. The 
earnings-based measure is calculated for a given time horizon, typically the next 12 month 
period. In this case, a constant or dynamic balance sheet assumption is more appropriate. 


Categories of IRR For the Basel Committee, there are three main sources of interest 
rate risk: gap risk, basis risk and option risk. Gap risk refers to the mismatch risk arising 
from the term structure of banking book instruments. It includes repricing risk and yield 
curve risk. Repricing risk corresponds to timing differences in the maturity or the risk of 
changes in interest rates between assets and liabilities. For example, if the bank funds a 
long-term fixed-rate loan with a short-term floating-rate deposit, the future income may 
decrease if interest rates increases. Therefore, repricing risk has two components. The first 
one is the maturity difference between assets and liabilities. The second one is the change in 


22 Because they are more adapted for comparing banks. 
23 Because banks want to manage the volatility of earnings. 
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floating interest rates. Yield curve risk refers to non-parallel changes in the term structure 
of interest rates. A typical example concerns flattening, when short-term interest rates rise 
faster than long-term interest rates. 


Basis risk occurs when changes in interest rates impact differently financial instruments 
with similar repricing tenors, because they are priced using different interest rate indices. 
Therefore, basis risk corresponds to the correlation risk of interest rate indices with the 
same maturity. For example, the one-month Libor rate is not perfectly correlated to the 
one-month Treasury rate. Thus, there is a basis risk when a one-month Treasury-based asset 
is funded with a one-month Libor-based liability, because the margin can change from one 
month to another month. 


Option risk arises from option derivative positions or when the level or the timing of 
cash flows may change due to embedded options. A typical example is the prepayment 
risk. The Basel Committee distinguishes automatic option risk and behavioral option risk. 
Automatic options concern caps, floors, swaptions and other interest rate derivatives that 
are located in the banking book, while behavioral option risk includes fixed rate loans 
subject to prepayment risk, fixed rate loan commitments, term deposits subject to early 
redemption risk and non-maturity deposits (or NMDs). 


Risk measures The economic value of a series of cash flows CF = {CF (t,) ,t, > t} is 
the present value of these cash flows: 


EV = PV;(CF) 
= E|) CF (tx) rene? 
tpt 
= 5 CF (tx) - B (t, te) 
thot 


where B; (tx) is the discount factor (e.g. the zero-coupon bond) for the maturity date tg. To 
calculate the economic value of the banking book, we slot all notional repricing cash flows 
of assets and liabilities into a set of time buckets. Then, we calculate the net cash flows, 
which are equal to CF (tk) = CF 4 (t,) —CF rz (tx) where CF 4 (tk) and CF z (tẹ) are the cash 
flows of assets and liabilities for the time bucket tẹ. Finally, the economic value is given by: 


EV = 5) CF(t,)-B(t,te) 
t,>t 
= 5) CFA (tk): B (t,tk)— X. CF x (tk): B(t, te) 
thot thot 
= EV,4-EV, 


It is equal to the present value of assets minus the present value of liabilities. By construc- 
tion, the computation of EV depends on the yield curve. We introduce the notation s in 
order to take into account a stress scenario of the yield curve. Then, we define the EV 
change as the difference between the EV for the base scenario and the EV for the given 
scenario s: 


AEV, = EV -—EV, 
= > CF» (tk) + Bo (t,tk)— X. CBs (te) Bs (t, te) 


that thot 
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FIGURE 7.9: Relationship between A (t), L* (t) and E (t) 


In this equation, the base scenario is denoted by 0 and corresponds to the current term 
structure of interest rates. The stress scenario s of the yield curve impacts the discount 
factors, but also the cash flows that depend on the future interest rates. AEV, > 0 indicates 
then a loss if the stress scenario s occurs. The Basel Committee defines the concept of 
economic value of equity (EVE or EV g) as a specific form of EV where equity is excluded 
from the cash flows. We recall that the value of assets is equal to the value of liabilities at the 
current time t. If we distinguish pure liabilities L* (t) from the bank equity capital E (t), we 
obtain the balance sheet given in Figure 7.9. Since there is a perfect match between assets 
and liabilities, the value of the capital is equal to?4 E (t) = A(t) — L* (t). It follows that: 


EVE = EV4— EVz~ 


We can then define AEVE, as the loss AEV, where we have excluded the equity from 
the computation of the cash flows. Said differently, A EVE, is the capital loss if the stress 
scenario s occurs. 


Remark 75 Changes in economic value can also be measured with the PVO1 metric or 
the economic value-at-risk (EVaR). PVO01 is calculated by assuming a single basis point 
change in interest rates. EVaR is the value-at-risk measure applied to the economic value 
of the banking book. Like the VaR, it requires specifying the holding period and the confi- 
dence level. The Basel Committee motivates the choice of EVE instead of PV0O1 and EVaR, 
because they would like to measure the impact of losses on the capital in a stress testing 
framework. In particular, PVO1 ignores basis risks whereas EVaR is designed for normal 
market circumstances. 


Earnings-based measures are computed using the net interest income (NII), which is the 
difference between the interest payments on assets and the interest payments of liabilities. 
Said differently, NII is the difference between interest rate revenues received by the bank 
and interest rate costs paid by the bank. For a given scenario s, we define the change in net 
interest income as follows: 

ANII, = NIIp — NII, 


Like for the risk measure AEVE,, A NII, > 0 indicates a loss if the stress scenario s occurs. 


24We have: 


A(t) L(t) 


L* (t) + E (t) 
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Finally, the economic value and earnings-based risk measures are equal to the maximum 
of losses by considering the different scenarios: 


R (EVE) = max (A EVE,, 0) 


and: 
R (NII) = max (A NII,, 0) 


Since IRRBB is part of the second pillar, there are no minimum capital requirements K. 
Nevertheless, the Basel Committee imposes that R (EVE) must be lower than 15% of the 
bank’s tier 1 capital. 


7.1.3.2 Interest rate risk principles 


The Basel Committee defines nine IRR principles for banks and three IRR principles 
for supervisors. The first and second principles recall that banks must specifically manage 
IRRBB (and also CSRBB?’) and have a governing body that oversights IRRBB. The third 
and fourth principles explain that the risk appetite of the bank for IRRBB must be defined 
with respect to both economic value and earnings-based risk measures arising from interest 
rate shocks and stress scenarios. The objective is to measure the change in the net present 
value of the banking book and the future profitability. To compute AEVE, banks must 
consider a run-off balance sheet assumption, whereas they must use a constant or dynamic 
balance sheet and a rolling 12-month period for computing A NII. For that, they have to 
consider multiple interest rate scenarios, for example historical and hypothetical scenarios. 
Besides these internal scenarios, six external scenarios are defined by the Basel Committee?®: 
(1) parallel shock up; (2) parallel shock down; (3) steepener shock (short rates down and 
long rates up); (4) flattener shock (short rates up and long rates down); (5) short rates 
shock up; and (6) short rates shock down. The fifth principle deals with behaviorial and 
modeling assumptions, in particular embedded optionalities. The last three principles deals 
with risk management and model governance process, the disclosure of the information and 
the capital adequacy policy. 

The role of supervisors is strengthened. They should collect on a regular basis sufficient 
information from the bank to assess its IRRBB exposure. This concerns modeling assump- 
tions, interest rate and option exposures, yield curve parameters, statistical methodologies, 
etc. An important task is also the identification of outlier banks. The outlier/materiality 
test compares the bank’s maximum A EVE (or R (EVE)) with 15% of its tier 1 capital. If 
this threshold is exceeded, supervisors must require mitigation actions, hedging programs 
and/or additional capital. 


7.1.3.3 The standardized approach 


Overview of the standardized framework There are five steps for measuring the 
bank’s IRRBB: 


1. The first step consists in allocating the interest rate sensitivities of the banking book 
to three categories: 


(a) amenable to standardization?’; 
(b) less amenable to standardization?®; 
(c) not amenable to standardization?’. 


25Credit spread risk in the banking book. 

26These scenarios are described in the next paragraph on page 397. 

27The Basel Committee distinguish two main categories: fixed rate positions and floating rate positions. 

28 They concern explicit automatic interest rate options. 

2°This category is composed of NMDs, fixed rate loans subject to prepayment risk and term deposits 
subject to early redemption risk. 
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2. Then, the bank must slot cash flows (assets, liabilities and off-balance sheet items) into 
19 predefined time buckets*’: overnight (O/N), O/N-1M, 1M-3M, 3M-6M, 6M-9M, 
9M-1Y, 1Y-18M, 1.5Y—-2Y, 2Y-3Y, 3Y-4Y, 4Y—-5Y, 5Y-6Y, 6Y-7Y, 7Y-8Y, 8Y-9Y, 
9Y-10Y, 10Y—-15Y, 15Y—20Y, 20Y+. This concerns positions amenable to standard- 
ization. For positions less amenable to standardization, they are excluded from this 
step. For positions with embedded automatic interest rate options, the optionality is 
ignored. 


3. The bank determines A EVE, for each interest rate scenario s and each currency c. 


4. In the fourth step, the bank calculates the total measure for automatic interest rate 
option risk KAO, c- 


5. Finally, the bank calculates the EVE risk measure for each interest rate shock s: 


R (EVE,) = max = (A EVE; e +KAOs,e)" ; o) 


c 


The standardized EVE risk measure is the maximum loss across all the interest rate 
shock scenarios: 
R (EVE) = max R (EVE, ) 


The supervisory interest rate shock scenarios The six stress scenarios are based on 
three shock sizes that the Basel Committee has calibrated using the period 2010 — 2015: 
the parallel shock size Sg, the short shock size Sı and the long shock size Sg. In the table 


below, we report their values for some currencies*!: 


Shock size | USD/CAD/SEK EUR/HKD GBP JPY EM 
So (parallel) 200 200 250 100 400 
Sı (short) 300 250 300 100 500 
Sə (long) 150 100 150 100 300 


where EM is composed of ARS, BRL, INR, MXN, RUB, TRY and ZAR. Given Sọ, Sı and 
S2, we calculate the following generic shocks for a given maturity t: 


Parallel shock Short rates shock Long rates shock 
A R(parallel) (t) A R(short) (t) A R(long) (t) 


Down —So —S,-e7t/T —So- (1 — e™t/T) 


where T is equal to four years. Finally, the five standardized interest rate shock scenarios 
are defined as follows: 


1. Parallel shock up: 
A R(Parallel) (t) = +% 


30The buckets are indexed by k from 1 to 19. For each bucket, the midpoint is used for defining the 
corresponding maturity tg. We have tı = 0.0028, t2 = 0.0417, t3 = 0.1667, t4 = 0.375, t5 = 0.625, ..., 
ty7 = 12.5, tig = 17.5 and tyg = 25. 

31The values for a more comprehensive list of currencies are given in BCBS (2016d) on page 44. 
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2. Parallel shock down: 
A Rarallel) (t) = —So 


3. Steepener shock (short rates down and long rates up): 

A RCteepnener) (4) — 0.90 - |AR(one) o| — 0.65- [arero (0) 
4. Flattener shock (short rates up and long rates down): 

A Rifiattene) (4) — 9.80. Aneto) (0) — 0.60: JARO) ol 


5. Short rates shock up: 
A RGhort) (t) =+4S,- eo t/t 


6. Short rates shock down: 
A R(hort) (t) =-§,- ew t/t 


Example 71 We assume that So = 100 bps, Sı = 150 bps and Sg = 200 bps. We would 
like to calculate the standardized shocks for the one-year maturity. 


The parallel shock up is equal to +100 bps, while the parallel shock down is equal to 
—100 bps. For the short rates shock, we obtain: 


ARGHort) (t) = 150 x e714 = 116.82 bps 


for the up scenario and —116.82 bps for the down scenario. Since we have [A RGhort) (t)| = 
116.82 and |A R0ons) (t)| = 44.24, the steepener shock is equal to: 


AR(Steepnener) (+) — 0,90 x 44.24 — 0.65 x 116.82 = —36.12 bps 
For the flattener shock, we have: 
AR(fattener) (4) — 0.80 x 116.82 — 0.60 x 44.24 = 66.91 bps 


In Figure 7.10, we have represented the six interest rate shocks AR (t) for the set of param- 
eters (So = 100, Sı = 150, S2 = 200). 

In Figure 7.11, we consider the yield curve generated by the Nelson-Siegel model’? with 
the following parameters 0, = 8%, 62 = —7%, 03 = 6% and 64 = 10. Then, we apply the 
standardized interest rate shocks by considering EUR and EM currencies. We verify that 
the parallel shock moves uniformly the yield curve, the steepener shock increases the slope 
of the yield curve, the flattener shock reduces the spread between long and short interest 
rates, and the short rates shock has no impact on the long maturities after 10 years. We 
also notice that the deformation of the yield curve is more important for EM currencies 
than for the EUR currency. 


Treatment of NMDs_ NMDs are segmented into three categories: retail transactional, 
retail non-transactional and wholesale Then, the bank must estimate the stable and non- 
stable part of each category’. Finally, the stable part of NMDs must be split between core 
and non-core deposits. However, the Basel Committee imposes a cap wt on the proportion 
of core deposits (see Table 7.12). For instance, core deposits cannot exceed 70% of the retail 
non-transactional stable deposits. The time bucket for non-core deposits is set to overnight 
(or the first time bucket), meaning that the corresponding time bucket midpoint is equal 
to tı = 0.0028. For core deposits, the bank determines the appropriate cash flow slotting, 
but the average maturity cannot exceed the cap t*, which is given in Table 7.12. 


32We recall that it is defined in Footnote 8 on page 131. 
33This estimation must be based on the historical data of the last 10 years. 
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FIGURE 7.10: Interest rate shocks (in bps) 
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TABLE 7.12: Cap on core deposits and maximum average maturity 


Category Cap wt Cap tt 
Retail transactional 90% 5.0 
Retail non-transactional 70% 4.5 
Wholesale 50% 4.0 


Behavioral options of retail customers This category mainly concerns fixed rate loans 
because of the prepayment risk, and fixed-term deposits because of the early redemption 
risk. The Basel Committee proposes to use a two-step procedure. First, the bank determine 
the baseline estimate of each category given the current yield curve. Then, the baseline 
estimate is multiplied according to the standardized interest rate scenarios. In the case 
of fixed rate loans subject to prepayment risk, the bank establishes the different homoge- 
nous prepayment categories. For each category, the bank estimates the baseline conditional 
prepayment rate CPRo and calculates the stressed conditional prepayment rate as follows: 


CPR, = min (1,7, - CPRo) 
where ys is the multiplier for the scenario s. The coefficient y, takes two values: 
e y, = 0.8 for the scenarios 1, 3 and 5 (parallel up, steepener and short rates up); 
e y, = 1.2 for the scenarios 2, 4 and 6 (parallel down, flattener and short rates down). 
The cash flow for the time bucket t is the sum of two components: 
CF, (tk) = CF} (tk) + CF? (tz) 


where CF! (t,) refers to the scheduled interest and principal repayment (without prepay- 
ment) and CF? (tk) refers to the prepayment cash flow: 


CF? (tk) = CPR, N; (tk—1) 


where N, (t,—1) is the notional outstanding at time bucket tk—ı calculated with the stress 
scenario s. 


The methodology for term deposits subject to early redemption risk is similar to the one 
of the fixed rate loans subject to prepayment risk. First, the bank estimates the baseline 
term deposit redemption ratio TDRRo for each homogeneous portfolio. Second, the stressed 
term deposit redemption ratio is equal to: 


TDRR, = min (1, ys -TDRRo) 
where ys is the multiplier for the scenario s. The coefficient y, takes two values: 
e y; = 1.2 for the scenarios 1, 4 and 5 (parallel up, flattener and short rates up); 
e y; = 0.8 for the scenarios 2, 3 and 6 (parallel down, steepener and short rates down). 


Third, the term deposits which are expected to be redeemed early are slotted into the 
overnight time bucket, implying that the corresponding cash flows are given by: 


CF, (t1) = TDRR, -No 


where No is the outstanding amount of term deposits. 
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Remark 76 Fixed rate loans subject to prepayment risk and term deposits subject to early 
redemption risk follow the same methodology, but with two main differences. The first one 
concerns the impact of the stress scenario on the stress ratios CPR, and TDRR.g. In the case 
of prepayment risk, the conditional prepayment rate generally increases when interest rates 
are falling and decreases when interest rates are rising. This is why we have CPR, > CPRo 
for the scenarios where interest rates or the slope of the yield curve decrease (scenarios 
1, 8 and 5). In the case of early redemption risk, the term deposit redemption ratio mainly 
depends on the short term interest rates. In particular, the ratio TDRR, must increase when 
short rates increase, because this creates an incentive to negotiate a term deposit contract 
with a higher interest rate. 


Automatic interest rate options The computation of the automatic interest rate op- 
tion risk KAO, is given by: 


KAO, = 5 AFVAO,;— 5) AFVAO,,; 


ies i€B 
where: 
e i € S denotes an automatic interest rate option which is sold by the bank; 
e i € B denotes an automatic interest rate option which is bought by the bank; 


e FVAOo,; is the fair value of the automatic option 7 given the current yield curve and 
the current implied volatility surface; 


e FVAO, ; is the fair value of the automatic option i given the stressed yield curve and 
a relative increase in the implied volatility of 25%; 


A FVAO,;, ; is the change in the fair value of the option: 


AFVAO,; = FVAO, ; — FVAQo,; 


An example We consider a simplified USD-denominated balance sheet. The assets are 
composed of loans with the following cash flow slotting: 


Instruments Loans Loans Loans 
Maturity 1Y 5Y 13Y 
Cash flows 200 700 200 


The loans are then slotted into three main buckets (short-term, medium-term and long-term 
loans). The average maturity is respectively equal to one-year, five-year and thirteen-year. 
The liabilities are composed of retail deposit accounts, term deposits, debt and tier-one 
equity capital. The cash flow slotting is given below: 


Tnstriients Non-core Term Core Debt Equity 
deposits deposits deposits ST LT capital 

Maturity O/N 7M 3Y 4Y 8Y 

Cash flows 100 50 450 100 100 200 


The non-maturity deposits are split into non-core and core deposits. The maturity of non- 
core deposits is assumed to be overnight (O/N), whereas the estimated maturity of core 
deposits is around three years. We also have two debt instruments: one with a remaining 
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TABLE 7.13: Economic value of the assets 
Bucket tk CFo (tk) Ro (tk) EVo (tk) 
6 0.875 200 1.55% 197.31 
11 4.50 700 3.37% 601.53 
17 12.50 100 5.71% 48.98 
EVo 847.82 


TABLE 7.14: Economic value of the pure liabilities 


Bucket tk CFo (tk) Ro (tk) EVo (tk) 
1 0.0028 100 1.00% 100.00 


5 0.625 50 1.39% 49.57 

9 2.50 450 2.44% 423.35 
10 3.50 100 2.93% 90.26 
14 7.50 100 4.46% 71.56 
EVo 734.73 


maturity of four years and another one with a remaining maturity of eight years. The term 
deposits are slotted in a single bucket corresponding to a seven-month maturity. 


We assume that the current yield curve is given by the Nelson-Siegel model with 6; = 8%, 
02 = —7%, 03 = 6% and 64 = 10. In Table 7.13, we have reported the current economic value 
of the assets. It is respectively equal to 197.31, 601.53 and 48.98 for the three buckets and 
847.82 for the total of assets. We have done the same exercise for the pure liabilities (Table 
7.14). We obtain an economic value equal to 734.73. We deduce that the current economic 
value of equity is EVEp = 847.82 — 734.73 = 113.09. Since the balance sheet is expressed in 
USD, we use the USD shocks for the interest rates scenarios: So = 200 bps, Sı = 300 bps and 
S2 = 150 bps. In Table 7.15, we have reported the stressed values of interest rates Rs (t) and 
economic value EV, (t) for every bucket of the balance sheet. By computing the stressed 
economic value of assets and pure liabilities, we deduce the stressed economic value of equity. 
For instance, in the case of the first stress scenario, we have EVE; = 781.79— 697.39 = 84.41. 
It follows that the economic value of equity will be reduced if the standardized parallel shock 
up occurs: A EVE; = 113.10 — 84.41 = 28.69. We observe that the economic value of equity 
decreases for scenarios 1, 3 and 5, and increases for scenarios 2, 4 and 6. Finally, we deduce 
that the risk measure R (EVE) = max, (AEVE,, 0) = 28.69 represents 14.3% of the equity. 
This puts under the threshold 15% of the materiality test. 


7.1.4 Other ALM risks 


Even if liquidity and interest rate risks are the main ALM risks, there are other risks 
that impact the banking book of the balance sheet, in particular currency risk and credit 
spread risk. 


7.1.4.1 Currency risk 


We recall that the standardized approach for implementing IRRBB considers each 
currency separately. Indeed, the risk measures AEVE,,. and KAO, are calculated 
for each interest rate scenario s and each currency c. Then, the aggregated value 
Xe (A EVE; c + KAGs2)" is calculated across the different currencies and the maximum is 
selected for the global risk measure of the bank. 
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TABLE 7.15: Stressed economic value of equity 


Bucket s=l1 s=2 s=3 s=4 s5=5 s=6 
Assets 

R, (te) 3.55%  —0.45% 0.24% 3.30% 3.96% —0.87% 

Rs (t11) 5.87% 1.37% 3.65% 3.54% 4.34% 2.40% 

Rs (tiz) 7.71% 3.71% 6.92% 4.96% 5.84% 5.58% 


EV, (te) 193.89 200.80 199.57 194.31 193.20 201.52 
EV, (t11) 549.76 658.18 594.03 596.91 575.74 628.48 


Rs (tı) 3.00% —1.00% —0.95% 3.40% 4.00% —2.00% 
Rs (ts) 3.39% —0.61% —0.08% 3.32% 3.96% -1.17% 
Rs (to) 4.44% 0.44% 2.038% 3.31% 4.05% 0.84% 
Rs (tio) 493% 0.93% 2.90% 3.40% 4.18% 1.68% 
Re (tia) 6.46% 2.46% 5.31% 4.07% 4.92% 4.00% 
EV, tı 99.99 100.00 100.00 99.99 99.99 100.01 


E ) 
EV, (ts) 48.95 50.19 50.02 48.97 48.78 50.37 
E ) 402.70 445.05 427.77 414.27 406.69 440.69 
EV; (tio) 84.16 96.80 90.34 88.77 86.39 94.30 
EV, (t14) 61.59 83.14 67.17 73.70 69.13 74.07 
EV, 697.39 775.18 735.31 725.71 710.98 759.43 
Equity 
EVE, 84.41 14668 100.43 119.34 106.13 120.37 
- AEVE, 28.69 —33.58 12.67 —6.24 6.97 —7.27 


One of the issues concerns currency hedging. Generally, it is done by rolling reverse 
FX forward contracts, implying that the hedging cost is approximatively equal to 7* — i, 
where 7 is the domestic interest rate and 7* is the foreign interest rate. This relationship 
comes from the covered interest rate parity (CIP). We deduce that the hedging cost can 
be large when i* >> i. This has been particularly true for European and Japanese banks, 
because these regions have experienced some periods of low interest rates. The question 
of full hedging, partial hedging or no hedging has then been readdressed after the 2008 
Global Financial Crisis. Most of banks continue to fully hedge the banking book including 
the equity capital, but it is not obvious that it is optimal. Another issue has concerned the 
access to dollar funding of non-US banks. Traditionally, “their branches and subsidiaries 
in the United States were a major source of dollar funding, but the role of these affiliates 
has declined” (Aldasoro and Ehlers, 2018, page 15). Today, we notice that a lot of non-US 
banks issue many USD-denominated debt instruments in order to access dollar funding**. 
Banks must now manage a complex multi-currency balance sheet, implying that currency 
management has become an important topic in ALM. 


7.1.4.2 Credit spread risk 


According to BCBS (2016d), credit spread risk in the banking book (CSRBB) is driven 
“by changes in market perception about the credit quality of groups of different credit-risky 
instruments, either because of changes to expected default levels or because of changes to 


34See for instance annual reports of European and Japanese banks. 
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market liquidity”. In Figure 7.12, we have reproduced the scheme provided by the Basel 
Committee in order to distinguish IRRBB and CSRBB. Therefore, CSRBB can be seen 
as the ALM spread risk of credit-risky instruments which is not explained by IRRBB and 
idiosyncratic credit risk. However, the definition provided by the Basel Committee is too 
broad, and does not avoid double counting with credit and jump-to-default risk. At of 
the date of the publication of this book, the debate on CSRBB is far from finished, even if 
CSRBB must be monitored and assessed since 2018. 


ee at amortized cost Items at fair value (MtM) 
reo 


Idiosyncratic credit spread 
Administered rate Credit margin 
Market credit spread 


Q 
Ù 
, 3 a 
Funding margin Market liquidity spread z 

fa Market duration spread 
z Funding rate = 
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Risk-free rate ee 
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interest-earnings securities 


FIGURE 7.12: Components of interest rates 


Source: BCBS (2016d, page 34). 


7.2 Interest rate risk 


In this section, we focus on the ALM tools that are related to the interest rate risk in the 
banking book. We first introduce the concept of duration gap and show how it is related 
to the economic value risk of the banking book. Then, we present the different ways to 
calculate earnings-at-risk (EaR) measures and focus more particularly on the net interest 
income sensitivity and the interest rate hedging strategies. The third part is dedicated to 
funds transfer pricing, whose objective is to centralize all interest rate risks, to manage 
them and to allocate profit between the different business units. Finally, we present an 
econometric model for simulating and evaluating interest rate scenarios. 


35See for example the position of the European Banking Federation (EBF): www.ebf .eu/regulation-su 
pervision/credit-spread-risk-in-the-banking-book-ebf-position. 
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7.2.1 Duration gap analysis 


In this section, we focus on the duration gap analysis, which is an approximation of the 
repricing gap analysis we have previously presented. The idea is to obtain an estimation of 
A EVE. Although this approach is only valid in the case of parallel interest rate shocks”®, 
it is an interesting method because we obtain closed-form formulas. In the case of non- 
parallel interest rate scenarios or if we want to obtain more accurate results, it is better to 
implement the repricing gap analysis, which consists in computing the stressed economic 
value of assets and liabilities in order to deduce the impact on the economic value of equity. 


7.2.1.1 Relationship between Macaulay duration and modified duration 
We consider a financial asset, whose price is given by the present value of cash flows: 
V= J_ B (ttr): CF (te) 
thot 


where CF (tẹ) is the cash flow at time t and B (t, tx) is the associated discount factor. The 
Macaulay duration D is the weighted average of the cash flow maturities: 


D= 5° wit,te) - (te — t) 


thot 
where w (t,tp) is the weight associated to the cash flow at time tg: 


B (t,tp) - CF (tx) 


w (t,tp) = V 


In the case of a zero-coupon bond whose maturity date is T, the Macaulay duration is equal 
to the remaining maturity T — t. 


Let us define the yield to maturity y as the solution of the following equation: 


F 
ve CF (tx) 
(tk—t) 
thot (1 + y) 
We have: 
OV tk—t)—1 
gg = -l-t (+y) OP CP Ute) 
th>t 
o D 
(+y) 
= -9.V 
where D is the modified duration: D 
= I+y 


We deduce that the modified duration is the price sensitivity measure: 


o1 aV_ dmv 


OT By ôy 


36 The duration gap analysis can be viewed as the first-order approximation of the repricing gap analysis. 
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If the yield to maturity is low, we have D ~ D. Since the Macaulay duration is easier to 
interpret, the modified duration is more relevant to understand the impact of an interest 
rate stress scenario. Indeed, we have: 


AV =—-D-V-Ay 


Nevertheless, we can use the following alternative formula to evaluate the impact of an 
interest rate parallel shift: 
Ay 


AV ax -D.V. 
1+y 


Remark 77 Using a continuous-time framework, the yield to maturity is defined as the 
root of the following equation: 


V= 5 eI- . CF (tp) 


thet 


We deduce that: 


or = 5 — (tp — t) - eY) . CF (tp) 
y thot 


= -D.V 


It follows that the modified duration D is defined as the Macaulay duration D in continuous- 
time modeling. 


Example 72 We consider the following cash flows stream {tp, CF (tx) y: 


r |1 4 7 1 
CF (tz) | 200 500 200 100 


The current zero-coupon interest rates are: R(1) = 2%, R(4) = 3%, R(7) = 4%, and 
R(11) = 5%. 


If we consider the discrete-time modeling framework, we obtain V = 850.77, y = 3.61%, 
D = 4.427 and D = 4.273. A parallel shock of +1% decreases the economic value since 
we obtain V (R+ AR) = 816.69. It follows that AV = —34.38. Using the duration-based 


approximation, we have”: 


AV x -D.V-AR 
= —4.273 x 850.77 x 1% 
—36.35 


In the case of the continuous-time modeling framework, the results become V = 848.35, 
y = 3.61% and D = D = 4.422. If we consider a parallel shock of +1%, the exact value of 
AV is equal to —35.37, whereas the approximated value is equal to —37.51. In Table 7.16, we 
also report the results for a parallel shock of —1%. Moreover, we indicate the case where we 
stress the yield to maturity and not the yield curve. Because V (y + AR) 4 V (R+ AR), 
we observe a small difference between the approximation and the true value of AV. 


37 This approximation is based on the assumption that the yield curve is flat. However, numerical experi- 
ments show that it is also valid when the term structure of interest rates is increasing or decreasing. 
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TABLE 7.16: Impact of a parallel shift of the yield curve 


Discrete-time | Continuous-time 


AR +1% -1% +1% -1% 

V(R+AR) 816.69 887.52 812.78 886.09 
AV —34.38 36.75 —35.57 37.74 
“Viy+AR) 815.64 888.42 811.94 887.02 
AV —35.13 37.64 —36.41 38.67 


Approximation —36.35 36.35 —37.51 37.51- 


Remark 78 From a theoretical point of view, duration analysis is valid under the assump- 
tion that the term structure of interest rates is flat and the change in interest rates is a 
parallel shift. This framework can be extended by considering the convexity: 


1 &V 
V ag? 


In this case, we obtain the following second-order approximation: 
1 
AV —D-V-Ay+5€-V- (Ay) 


7.2.1.2 Relationship between the duration gap and the equity duration 


Let Vj and D; be the market value and the Macaulay duration associated to the j™ 
cash flow stream. Then, the market value of a portfolio that is composed of m cash flow 
streams is equal to the sum of individual market values: 


while the duration of the portfolio is the average of individual durations: 


RS “Dy 
j=1 


where: 


This result is obtained by considering a common yield to maturity. 
We recall that E (t) = A (t) — L* (t) and EV g = EV 4 — EV z». Using the previous result, 
we deduce that the duration of equity is equal to: 


EV, EV» 

D -D De: 

2 EVa EV;  BVi EV, 
EVA 


‘Doz i 
EV —EVgpe 9 (7.9) 


where the duration gap (also called DGAP) is defined as the difference between the duration 
of assets and the duration of pure liabilities scaled by the ratio EV z+ / EV a: 


Dgap = Da — Djs (7.10) 
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Another expression of the equity duration is: 


EV, Deéap 
Deg —— 205 ST 
E EV; Gap A/E 


(7.11) 


We notice that the equity duration is equal to the duration gap multiplied by the leverage 
ratio, where L 4/p is the ratio between the economic value of assets and the economic value 
of equity. 


By definition of the modified duration, we have*®: 


AEVE = AEVg 

x -Dp EVE “Ay 

Ay 
= ps EVg- 
E E'TI y 
Using Equation (7.11), we deduce that: 
A EVE D EV al 7.12 
~ =r Gap: A T $ y ( : ) 


Formulas (7.10) and (7.12) are well-known and are presented in many handbooks of risk 
management (Crouhy et al., 2013; Bessis, 2015). 


7.2.1.3 An illustration 


We consider the following balance sheet: 


Assets V; Dj Liabilities V Dj 
Cash 5 0.0 Deposits 40 3.2 
Loans 40 1.5 CDs 20 0.8 
Mortgages 40 6.0 Debt 30 1.7 
Securities 15 3.8 | Equity capital 10 
| Total 100 | ‘Total 100- 


We have EV, = 100, EVz» = 90 and EV p = 10. We deduce that the leverage is equal to: 


EVa 100 
—=10 
Lale = Br, = JO 
The duration of assets is equal to: 
5 40 40 15 
Da = — — xl. — : — 8=3. ears 
A T00 * 9+ Too * 5+ igo X 60+ Tog x 38 3.57 years 
For the pure liabilities, we obtain: 
40 20 30 
Dy = — 2+— ; — xX17=2.1 ; 
L 90 * t gg UST og * 7 7 years 


It follows that the duration gap is equal to: 


90 
Dgap = 3.57 — 100 * 2.17 = 1.62 years 


38 We recall that EVE is an alternative expression for designating EV g. 
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while the value Dg of the equity duration is 16.20 years. Since Dgap is equal to 1.62 years, 
the average duration of assets exceeds the average duration of liabilities. This is generally 
the normal situation because of the bank’s liquidity transformation (borrowing short and 
lending long). In the table below, we have reported the impact of an interest rate shift on 
the economic value of equity when the yield to maturity is equal to 3%: 


Ay -2% -1% +1% +2% 
AEVE 3.15 1.57 —1.57 —3.15 
AEVE 


———_ 314 15. —15. —31.4 
EVE 31.46% 15.73% 5.73% 31.46% 


Since the duration gap is positive, the economic value of equity decreases when interest 
rates increase, because assets will fall more than liabilities. For instance, an interest rate 
rise of 1% induces a negative variation of 1.57 in EVE. This impact is large and represents 
a relative variation of —15.73%. 


7.2.1.4 Immunization of the balance sheet 


In order to reduce the sensitivity of the bank balance sheet to interest rate changes, we 
have to reduce the value of |A EVE]. Using Equation (7.12), this is equivalent to control 
the value of the duration gap. In particular, a full immunization implies that: 


AEVE=0 © Dgap =0 
EV z» 


& Da- 
^4 EV, 


-Drs =0 (7.13) 


If we consider the normal situation where the duration gap is positive, we have three solu- 
tions: 


1. we can reduce the duration of assets; 
2. we can increase the relative weight of the liabilities with respect to the assets; 
3. we can increase the duration of liabilities. 


Generally, it takes time to implement the first two solutions. For instance, reducing the 
duration of assets implies redefining the business model by reducing the average maturity 
of loans. It can be done by decreasing the part of mortgages and increasing the part of 
short-term loans (e.g. consumer credit or credit cards). In fact, the third solution is the 
easiest way to immunize the bank balance sheet to interest rate changes. For example, the 
bank can issue a long-term debt instrument. Therefore, hedging the balance sheet involves 
managing the borrowing program of the bank. 


E * 
Let us consider the previous example. We found D4 = 3.57 and a = = It follows 
A 
that the optimal value of the liability duration must be equal to 3.97 years: 
100 
Dgap =0 & Drs = D0 x 3.57 = 3.97 years 


We assume that the bank issues a 10-year zero-coupon bond by reducing its current debt 
amount. The notional of the zero-coupon bond must then satisfy this equation: 
40 20 30- N 


N 
ee go %08 +t 90 x L7 + gh 10539 


or: 


__ 3.97 x 90 — (40 x 3.2 + 20 x 0.8 + 30 x 1.7) 
7 10 — 1.7 


N = 19.52 
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TABLE 7.17: Bank balance sheet after immunization of the duration gap 


Assets Vi Dj Liabilities Vj Dj 
Cash 5 0.0 Deposits 40 3.2 
Loans 40 1.5 CDs 20 0.8 
Mortgages 40 6.0 Debt 10.48 1.7 
Securities 15 3.8 | Zero-coupon bond 19.52 10.0 
Equity capital 10 0.0 

Total 100 — ~ Total ~ 10 


After immunization, the duration of equity is equal to zero and we obtain the balance sheet 
given in Table 7.17. 


Remark 79 The duration gap analysis covers the gap risk, which is the first-order source 
of interest rate risk. It is not adapted for measuring basis and option risks. For these two 
risks, we need to use the repricing analysis. 


7.2.2 Earnings-at-risk 


Earnings-at-risk assesses potential future losses due to a change in interest rates over 
a specified period. Several measures of earnings can be used: accounting earnings, interest 
margins, commercial margins, etc. For interest rate scenarios, we can use predefined”, 
historical or Monte Carlo scenarios. Once earnings distributions are obtained, we can analyze 
the results for each scenario, derive the most severe scenarios, compute a value-at-risk, etc. 
In this section, we first focus on the income gap analysis, which is the equivalent of the 
duration gap analysis when analyzing interest rate income risks. Then we present the tools 
for calculating the net interest income (NII). Finally, we consider hedging strategies in the 
context where both A EVE and NII risk measures are managed. 


7.2.2.1 Income gap analysis 


Definition of the gap Since AEVE measures the price risk of the balance sheet, A NII 
measures the earnings risk of the income statement. It refers to the risk of changes in the 
interest rates on assets and liabilities from the point of view of the net income. Indeed, if 
interest rates change, this induces a gap (or repricing) risk because the bank will have to 
reinvest assets and refinance liabilities at a different interest rate level in the future. The gap 
is defined as the difference between rate sensitive assets (RSA) and rate sensitive liabilities 
(RSL): 

GAP (t, u) = RSA (t, u) — RSL (t, u) (7.14) 


where t is the current date and u is the time horizon of the gap*®. While A EVE considers 
all the cash flows, A NII is generally calculated using a short-term time horizon, for example 
the next quarter or the next year. Therefore, rate sensitive assets/liabilities correspond to 
assets/liabilities that will mature or reprice before the time horizon of the gap. This is why 
the interest rate gap risk is also called the repricing risk or the reset risk. 

In order to calculate the interest rate gap, the bank must decide which items are rate 
sensitive. This includes two main categories. The first one corresponds to items that mature 


39Such as the six scenarios of the standardized IRRBB approach. 
40This means that h = u — t is the maturity of the gap. 
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before the time horizon t + h, whereas the second one corresponds to floating rate instru- 
ments. For example, consider the following balance sheet expressed in millions of dollars: 


Assets Amount | Liabilities Amount 

Loans Deposits 
Less than 1 year 200 Non-maturity deposits 150 
1 to 2 years 100 Money market deposits 250 
Greater than 2 years 100 Term deposits 

Mortgages Fixed rate 250 
Fixed rate 100 Variable rate 100 
Variable rate 350 Borrowings 

Securities Less than 1 year 50 
Fixed rate 50 Greater than 1 year 100 

Physical assets 100 Capital 100 

Total 1000 Total 1000 


If the time horizon of the gap is set to one year, the rate sensitive assets are loans with 
maturities of less than one year (200) and variable rate mortgages (350), while the rate 
sensitive liabilities are money market deposits (250), variable rate term deposits (100) and 
borrowings with maturities of less than one year (50). Therefore, we can split the balance 
sheet between rate sensitive, fixed rate and non-earning assets: 


Assets Amount Liabilities Amount 
Rate sensitive 550 Rate sensitive 400 
Fixed rate 350 Fixed rate 600 
Non-earning 100 Non-earning 100 


We deduce that the one-year gap is equal to $150 million: 
GAP (t,t + 1) = 550 — 400 = 150 


Approximation of ANII We consider the following definition of the net interest income: 
NII (t, u) = RSA (t, u) - Resa (t, u) + NRSA (t, u) - RNRSA (t, u) = 
RSL (t, u) - Rrsi (t, u) — NRSL (t, u) - Ryrsi (t, u) 


where RNSA and RNSL denote assets and liabilities that are not rate sensitive and Re (t, u) 
is the average interest rate for the category C and the maturity date u. We have: 


ANII (t, u) = NII (t+ h,u +h) — NI (t,u) 
By considering a static gap*!, we deduce that: 
ANII(t,u) = RSA(t,u)- (Resa (t+ h, u+ h) — Rrsa (t,u)) + 
NRSA (t, u) - (Ryrsa (t+ h, u +h) — Ryrga (t, u)) — 
RSL (t,u) - (Resi (t+ h, u + h) — Rest (t, u)) — 
NRSL (t u) - (Rest (t + h,u + h) — Rwnrst (t, u)) 
Since interest income and interest expense do not change for fixed rate assets and liabilities 


between t and t + h — RNRSA (t + h, Ut h) = RNRSA (t, u) and RNRSL (t + h, u + h) = 
Rwrst (t, u), we have: 


ANII(t, vu) = RSA (t,u) - ARrsa (t, u) — RSL (t, u) - ARrsr (t, u) 


“This means that RSA(tt+th,uth) = RSA(t,u), NRSA(t+h,u+h) = NRSA(t,u), 
RSL(t+h,u+h) = RSL (t,u) and NRSL(t+h,uw+h) = NRSL(t,u) where h = u — t. 
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By assuming that the impact of interest rate changes is the same for rate sensitive assets 
and liabilities, we finally obtain: 


ANII (¢,u) ~ GAP (t,u) AR (7.15) 


where AR is the parallel shock of interest rates. Income gap analysis is then described by 
Equations (7.14) and (7.15). 

For instance, if we consider the previous example, the one-year gap is equal to $150 
million and we have the following impact on the income: 


AR | -2%  -1% 0% +H% č +2% 
ANII | —$3 mn —$1.5mn 0 +$1.5mn +$3 mn 


If interest rates rise by 2%, the bank expects that its income increases by $3 million. On 
the contrary, the loss can be equal to $3 million if interest rates fall by 2%. 


Remark 80 The previous analysis is valid for a given maturity h = u — t. For example, 
ANII (t,t + 0.25) measures the impact for the next three months while A NII (t,t + 1) mea- 
sures the impact for the next year. It is common to consider the change in income for a 
given time period |u1, u2| where uy = t + hı and ug = t + hə. We notice that: 


ANII (¢,u1,u2) = ANII(t,u2) — ANI (t, u1) 
= (GAP (t,u2) —GAP(t,u,))-AR 
= GAP (t,u1,u2)-AR 
(RSA (t, ui, U2) — RSL (t, ui, u2)) - AR 


where GAP (t, u1, u2), RSA (t, u1, u2) and RSL (t, u1, u2) are respectively the static gap, rate 
sensitive assets and rate sensitive liabilities for the period [u1, u2]. 


7.2.2.2 Net interest income 


Definition We recall that the net interest income of the bank is the difference between 
interest rate revenues of its assets and interest rate expenses of its liabilities: 


NI (t,u)= XO Nitu) Ritu) SY > Nj(t,u)- Rj (tu) (7.16) 


1€ Assets jE Liabilities 


where NII (t, u) is the net interest income at time t for the maturity date u, N; (t, u) is the 
notional outstanding at time u for the instrument i and R; (t,u) is the associated interest 
rate. This formula is similar to the approximated equation presented above, but it is based 
on a full repricing model. However, this formula is static and assumes a run-off balance 
sheet. In order to be more realistic, we can assume a dynamic balance sheet. However, the 
computation of the net interest income is then more complex because it requires modeling 
the liquidity gap and also behavioral options. 


Anexample We consider a simplified balance sheet with the following asset and liability 
positions: 


e The asset position is made up of two bullet loans A and B, whose remaining maturity 
is respectively equal to 18 months and 2 years. The outstanding notional of each loan 
is equal to 500. Moreover, we assume that the interest rate is equal to 6% for the first 
loan and 5% for the second loan. 
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TABLE 7.18: Interest income schedule and liquidity gap 


u—t 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 
Loan A 7.00 7.50 7.50 7.50 7.50 7.50 
Loan B 6.25 6.25 6.25 6.25 625 6.25 6.25 6.25 
IR revenues 13.25 13.25 13.25 13.25 13.25 13.25 6.25 6.25 
Debt C 6.00 6.00 6.00 6.00 
Equity 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 
IR expenses 6.00 6.00 6.00 6.00 0.00 0.00 0.00 0.00 
NII (t, u) 7.25 7.25 7.25 7.25 13.25 13.25 6.25 6.25 
LG (t, u) 0 0 0 0 800 800 300 300 


e The liability position is made up of a bullet debt instrument C, whose remaining 
maturity is 1 year and outstanding notional is 800. We assume that the interest rate 
is equal to 3%. 


e The equity capital is equal to 200. 


To calculate the net interest income, we calculate the interest rate revenues and costs. By 
assuming a quarterly pricing, the quarterly income of the instruments are: 


1 

Ia = z * 8% x 500 = 7.50 
1 

Ig = z X 5% x 500 = 6.25 
1 

lc = z X 3% x 800 = 6.00 


We obtain the interest income schedule given in Table 7.18. However, calculating the net 
interest income as the simple difference between interest rate revenues and expenses ignores 
the fact that the balance sheet is unbalanced. In the last row in Table 7.18, we have reported 
the liquidity gap. At time u = t + 1.25, the value of the liabilities is equal to 200 because 
the borrowing has matured. It follows that the liquidity gap is equal to —800. At time 
u = t + 1.75, the loan A will mature. In this case, the liabilities is made up of the equity 
capital whereas the assets is made up of the loan B. We deduce that the liquidity gap is 
equal to 200 — 500 = —300. 


TABLE 7.19: Balance sheet under the constraint of a zero liquidity gap 


u—t 1.25 1.50 1.75 2.00 

Approach #1 Debt D 500 500 
Debt Æ 300 300 300 300 
Approach #2 Loan F 500 500 
Debt G 800 800 800 800 


At this stage, we can explore several approaches to model the net interest income, and 
impose a zero liquidity gap. In the first approach, the bank borrows 500 for the period 
[t+ 1,¢+ 1.50] and 300 for the period [t + 1,t+ 2]. This corresponds to debt instruments 
D and E in Table 7.19. We note Rz (t,u) the interest rate for these new liabilities. We 
notice that Řz (t,u) is a random variable at time t, because it will be known at time t+ 1. 
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We have: 
NII (t,u) = 13.25- 5 x 800 x Ry (t, u) 
= 13.25- ; x 800 x (Rr (t,u) — 3%) — ; x 800 x 3% 
= 7.25 — 200 x (Rz (t, u) — 3%) 
for u = t + 0.25 and u = t + 0.5, and: 
NII (t, u) = 6.25 — a x 300 x Ry (t, u) = 4.00 — 75 x (Řz (t, u) — 3%) 


for u = t + 1.75 and u = t + 2.0. 

The drawback of the previous approach is that the size of the balance sheet has been 
dramatically reduced for the two last dates. This situation is not realistic, because it assumes 
that the assets are not replaced by the new production. This is why it is better to consider 
that Loan A is rolled into Loan F, and the debt instrument C is replaced by the debt 
instrument G (see Table 7.19). In this case, we obtain: 


1 x 1 x 
NII (t,u) = 6.25+ z” 500 x Ry (t, u) — z” 800 x Rz (t, u) 


1 - 1 
= 6.25 + 7 x 500 x (Ra (t,u) -6%) +3 x 500 x 6% — 


1 > 1 
z X 800 x (Rr (t, u) — 3%) — z X 800 x 3% 
1 3 1 Z 
= 7.25 + 7 x 500 x (Ra (t,u) — 6%) — J X 800 x (Rx (t,u) — 3%) 


If we note AR; = Ry, (t,u) — 3% and AR4 = Ry (t,u) — 6%, we obtain the following 
figures*?: 


ARa | -2% -1% 0% 1% 2% | —2% 2% 1.5% 

ARL -2% -1% 0% 1% 2% 0% -1% 0.0% 
t+ 1.00 | 7.25 7.25 7.25 7.25 7.25 7.25 7.25 7.25 
t+ 1.25 | 11.25 9.25 7.25 5.25 3.25 ı 7.25 9.25 7.25 
t+ 1.50 | 11.25 9.25 7.25 5.25 3.25 7.25 9.25 7.25 
t+1.75 | 8.75 8.00 7.25 6.50 5.75 ı 4.75 6.75 4.13 
t+ 2.00 | 8.75 8.00 7.25 6.50 5.75 ! 4.75 6.75 413 


The case AR; = AR, is equivalent to use the income gap analysis. However, this approach 
is simple and approximative. It does not take into account the maturity of the instruments 
and the dynamics of the yield curve. Let us consider a period of falling interest rates. We 
assume that the yield of assets is equal to the short interest rate plus 2% on average while 
the cost of liabilities is generally equal to the short interest rate plus 1%. On average, the 
bank captures a net interest margin (NIM) of 1%. This means that the market interest rate 
was equal to 5% for Loan A, 4% for Loan B and 2% for Debt C. We can then think that 
Loan A has been issued a long time ago whereas Debt C is more recent. If the interest 
rate environment stays at 2%, we have Ry (t,u) = 4% and Rz (t,u) = 3%, which implies 
that AR, = 4% — 6% = —2% and AR, = 3% — 3% = 0%. We obtain the results given 
in the seventh column. We can also explore other interest rate scenarios or other business 


“We have NII(t,t+1) = 7.25, NII(t,t+1.25) = NII(t,t+1.5) = 7.25 — 200 x AR; and 
NII (t,t + 1.75) = NII (t,t + 2) = 7.25 +125 x AR, — 200 x ARL. 
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scenarios. For instance, the bank may be safer than before, meaning that the spread paid to 
the market is lower (eight column) or the bank may have an aggressive loan issuing model, 
implying that the interest rate margin is reduced (ninth column). 


Remark 81 The previous analysis gives the impression that the net interest income is 
known for u < t+ 1.5 and stochastic after. In fact, this is not true. Indeed, we notice that 
the interest rates of Loans A and B are equal to 6% and 5% whereas the current interest 
rates are around 2%. Therefore, we can anticipate that the bank will be subject to prepayment 
issues. Our analysis does not take into account the behavior of clients and the impact of 
embedded options in the net interest income*®. 


Mathematical formulation We reiterate that the net interest income is equal to: 


NI(t,u)= XO Nitu) Ritu) SY > Nj(t,u)- Rj (t,u) 


i€ Assets jELiabilities 


If we consider a future date t’ > t, we have: 


NII (tu) = 5 Ni (t, u)- Ri (t, u) — 5 N; (tu): Ri (t, u) — 


icAssets jELiabilities 


X Niu- X Nu) | RE, u) 


icAssets jELiabilities 


The future NII requires the projection of the new production and the forecasting of asset 
and liability rates (or customer rates). The third term represents the liquidity gap that 
must be financed or placed**. In what follows, we assume that the future liquidity gap is 
equal to zero in order to obtain tractable formulas. 


Since we have the identity A NII (t’, u) = GAP (t’,u) - AR, we deduce that: 


ANIL (Yu) 
1 e , 
GAP (t'u) = AR 
AR; (t’,u) 
— . / . 
= D mew (202-1) 
i€ Assets 


jELiabilities 


If we consider a continuous-time analysis where u = t’ + dt, we obtain: 


GAP(’,u) = 5 N (e (A 1) 


t€ Assets OR 
(yl 
>> Ny @t,u)- (Se = i) 
jELiabilities 


where R represents the market interest ratet. Demey et al. (2003) consider two opposite 
situations corresponding to two categories of asset/liability rates: 


43 This issue is analyzed in the third section of this chapter on page 427. 
44The borrowing/lending interest rate is denoted by R(t’, u). 
45We recall that the gap analysis assumes a flat yield curve. 
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Cı The asset /liability rates are deterministic and independent from market interest rates: 


ORG) ORE) ‘4 
ƏR ~=—h OR 


This category corresponds to contractual rates that are generally fixed. 


Cə The asset /liability rates depend on market interest rates: 


R,(t,u)=R+ma 
R; (t, u) = R+ MLE 


where ma and my are the commercial margins for assets and liabilities. It follows 


that: 
OR (t,u) OR; (t', u) 


OR ~~ + OR 
This category generally concerns floating rates that are based on a market reference 
rate plus a spread. 


=1 


We deduce that the gap is the difference between liabilities and assets that belong to the 
first category Cy: 


GAP(t,uJ= X N;(tju)— So Nitu) 
j€Liabilities i€ Assets 
jJEC1 4ECy 


Modeling customer rates Until now, we have used the variable R for defining the 
general level of interest rates and AR for defining a parallel shock on the yield curve. 
However, this definition is not sufficiently precise to understand the real nature of R. In 
fact, the study of client rates is essential to understand which interest rate is important for 
calculating earnings-at-risk measures. In what follows, we introduce the notation R(t) = 
R(t,t+ dt) and R(u) = R(u,u+du). The current date or the agreement date is denoted 
by t while u > t is a future date. 


We have already distinguished fixed rates and floating (or variable) rates. By definition, a 
fixed rate must be known and constant when the agreement is signed between the customer 
and the bank: 

R(u) = R* = R(t) 


On the contrary, the customer rate is variable if: 
Pr{R(u) = R(t)}} <1 


In this case, the customer rate is a random variable at time t and depends on a reference 
rate, which is generally a market rate. Mathematically, we can write: 


R(u) = R(t)-1{u<r}+R(r)-1{u>7} 
R*-1{u<r}+R(r)-1{u>7} (7.17) 


where 7 is the time at which the customer rate will change. 7 is also called the next repricing 
date. For some products, 7 is known while it may be stochastic in some situations*®. If R (r) 
is a function of a market rate, we can write: 


R(t) = f (7,7 (7)) 


46When the repricing date is known, it is also called the reset date. 
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We use the notation r (r), because the market rate is generally a short-term interest rate. 
If we assume a linear relationship (noted Heinear), we have: 


R(r)=p-r(r)+m (7.18) 


where p is the correlation between the customer rate and the market rate and m is related 
to the commercial margin*”. This is the simplest way for modeling R (T), but there are some 
situations where the relationship is more complex. For example, Demey et al. (2003) study 
the case where the customer rate has a cap: 
R(r) =r(T)-L{r(r)<r*}+r*-1{r(r)>rt}+ m 
where r+ + m is the cap. 

Another challenge for modeling R(u) is the case where the next repricing date 7 is 
unknown. We generally assume that 7 is exponentially distributed with parameter A. If we 
consider the linear relationship (7.18), it follows that the expected customer rate is: 


R(u) = E[R(u)| 
= Re-er-) 4 (9-r(u)+m)- (1 — gaua) (7.19) 


Sometimes, the relationship between the customer rate and the market rate is not instan- 
taneous. For instance, Demey et al. (2003) consider the case where the customer rate is an 


average of the market rate over a window period h. Therefore, Equation (7.19) becomes*®: 


Riu) = Ree MO 4p af (Ø -r (8) +1) -e AC) ds 
u—h 


Let us go back to the problem of determining the parallel shock AR. Using Equation 
(7.17),we have: 


AR = R(u)- R(t) 
O 0 ifu<rT 
R(r)— R* otherwise 


Under the assumption Heinear, we deduce that: 
AR=R(r)— R* =p-Ar (7.20) 


where Ar = r(7) — r (t) is the shock on the market rate. We notice that modeling the 
net interest income variation requires determining p and Ar. In the case where p = 0, 
we retrieve the previous result that A NII is not sensitive to fixed rate items. Otherwise, 
Equation (7.20) shows that interest rate gaps must be conducted on a contract by contract 
basis or at least for each reference rate: 


“Floating-rate interest gaps can be defined for all floating-rate references (1- 
month Libor, 1-year Libor, etc.). These floating-rate gaps are not fungible: they 
cannot be aggregated unless assuming a parallel shift of all rates” (Bessis, 2015, 
page 47). 


47The commercial margin is equal to: 
R(t) -r (r) 
m — (1—p)r(r) 


When the correlation is equal to one, m is equal to the commercial margin, otherwise it is greater. 
48We assume that u—h > t. 


3 
| 


418 Handbook of Financial Risk Management 


Indeed, two contracts may have two different correlations with the same reference rate, and 
two contracts may have two different reference rates. 

Equation (7.20) is valid only if we assume that the next repricing date is known. If r is 
stochastic, Demey et al. (2003) obtain the following formula: 


AR(u) = E[(R(u)- R(t) -1{u>7}] 
p:-Ar-Pr{r <u} 


We conclude that the sensitivity of the customer rate to the market rate is equal to: 


AR (u) 
Ar 


p(t,u) = =p-Pr{r<u} 
It depends on two parameters: the correlation p between the two rates and the probability 
distribution of the repricing date 7. If 7 follows an exponential distribution with parameter 
à, we have p(t,u) = p (1— eo), We verify that p(t,u) < p. The upper limit case 
p(t,u) = pis reached in the deterministic case (no random repricing), whereas the function 
p(t, u) is equal to zero if p is equal to zero (no correlation). By definition of the exponential 
distribution, the average time between two repricing dates is equal to 1/A. In Figure 7.13, 
we have reported the function p (t, u) for three values of the correlation : 0%, 50% and 100%. 
We show how Aà impacts the sensitivity p(t,u) and therefore A NII. This last parameter is 
particularly important when we consider embedded options and customer behavior*?. For 
instance, A = 0.1 implies that the contract is repriced every ten years on average (top/left 
panel). It is obvious that the sensitivity is lower for this contract than for a contract that 
is repriced every 2 years (top/right panel). 


= 0.1 (10 years) à = 0.5 (2 years) 


“0 2 4 6 8 10 12 14 16 18 20 


= œ (today) 


=p = 100% 
m= = 50% 
ee =-= 9 = 0% 


2 4 6 8 10 12 14 16 18 20 
u-t 


FIGURE 7.13: Sensitivity of the customer rate with respect to the market rate 


49See Section 7.3 on page 427. 
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7.2.2.3 Hedging strategies 


The question of hedging is not an easy task. There is no one optimal solution, but several 
answers. Moreover, this problem will be even more complicated when we will integrate the 
behavioral and embedded options. 


To hedge or not to hedge Since the net interest income is sensitive to interest rate 
changes, it is important to define a hedging policy and to understand how it may impact 
the income statement of the bank. Let us define the hedged net interest income as the sum 
of the net interest income and the hedge P&L: 


NII (t, u) = NII (t, wu) + H (t, u) 


In order to obtain a tractable formula of the hedge P&L H (t, u), we consider a forward rate 
agreement (FRA), which is an exchange contract between the future interest rate r (u) at 
the pricing date u and the current forward rate f (t,u) at the maturity date u. The hedge 
P&L is then: 


H (t, u) = Ny (t, u) i (f (t, u) RA (u)) 
where Ny (t, u) is the notional of the hedging strategy. We deduce that: 


ANIIy(t,u) = ANII (t, u) + AH (t,u) 
= GAP (t,u)- AR (u) — Ny (t,u)- Ar (u) 
(GAP (t, u) - p (t, u) — Nx (t,u)) - Ar (u) 


because we have AR (u) = p(t,u)- Ar (u). The hedged NII is equal to zero if the notional 
of the hedge is equal to the product of the interest rate gap and the sensitivity p (t, u): 


A NIIy (t, u) = 0 & Ny (t, u) = GAP (t, u) - p(t, u) 
In this case, we obtain: 
NII} (t, uw) — NII (t, vu) = GAP (t, u) - p(t, u) - (f (t,u) — r (u)) 
We can draw several conclusions from the above mathematical framework: 


e When the correlation between the customer rate and the market rate is equal to one, 
the notional of the hedge is exactly equal to the interest rate gap. Otherwise, it is 
lower. 


When the interest rate gap is closed, the bank does not need to hedge the net interest 
income. 


If the bank hedges the net interest income, the difference NIIy (t, u) — NII (t, u) is 
positive if the gap and the difference between f (t,u) and r(u) have the same sign. 
For example, if the gap is positive, a decrease of interest rates is not favorable. This 
implies that the hedged NII is greater than the non-hedged NII only if the forward 
rate f (t,u) is greater than the future market rate r (u). This situation is equivalent 
to anticipate that the forward rate is overestimated. 


We conclude that hedging the interest rate gap is not systematic and depends on the 
expectations of the bank. It is extremely rare that the bank fully hedges the net interest 
income. The other extreme situation where the NII is fully exposed to interest rate changes 
is also not very common. Generally, the bank prefers to consider a partial hedging. Moreover, 
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we reiterate that the previous analysis is based on numerous assumptions”. Therefore, it is 
useless to compute a precise hedging strategy because of these approximations. This is why 
banks prefer to put in place macro hedging strategies with a limited number of instruments. 


Hedging instruments In order to hedge the interest rate gap, the bank uses interest 
rate derivatives. They may be classified into two categories: those that hedge linear interest 
rate risks and those that hedge non-linear interest rate risks. The first category is made 
up of interest rate swaps (IRS) and forward rate agreements (FRA), while the second 
category concerns options such as caps, floors and swaptions. An IRS is a swap where two 
counterparties exchange a fixed rate against a floating rate or two floating rates. This is the 
hedging instrument which is certainly the most used in asset and liability management. The 
fixed rate is calibrated such that the initial value of the swap is equal to zero, meaning that 
the cost of entering into an IRS is low. This explains the popularity of IRS among ALM 
managers. However, these hedging instruments only concern linear changes in interest rates 
like the FRA instruments. In general, the ALM manager doesn’t close fully all the interest 
rate gaps because this is not the purpose of a macro hedging strategy. In practice, two or 
three maturities are sufficient to highly reduce the risk. 


Remark 82 In order to hedge non-linear risks (slope of the yield curve, embedded options, 
etc.), the bank may use options. However, they are more expensive than IRS and are much 
less used by banks. One of the difficulties is the high degree of uncertainty around customer 
behavioral modeling. 


7.2.3 Simulation approach 


We present here a general top-down econometric-based simulation framework in order 
to model the dynamics of the outstanding amount for the different items of the balance 
sheet. The underlying idea is that these items respond differently to key economic and 
market variables. The focus is then to model the earnings-at-risk profile of these items. 
The different profiles can also be aggregated in order to understand the income risk of each 
business line of the bank. 


The framework is based on the cointegration theory and error correction models”?. It is 
made up of two econometric models. We first begin by modeling the economic and market 
variables x (t) = (x1 (t),...,;%m (t)) with a VECM: 


d, (L) Ax (t) = Ilaz (t — 1) + €z (t) (7.21) 


where ®(L) = Im — ®;L — ... — ®,L? is the lag polynomial and €s (t) ~ N (0, £z). By 
definition, Equation (7.21) is valid if we have verified that each component of < (t) is inte- 
grated of order one. The choice of the number p of lags is important. Generally, we consider 
a monthly econometric model, where the variables x (t) are the economic growth g (t), the 
inflation rate a(t), the short-term market rate r (t), the long-term interest rate R (t), etc. 
In practice, p = 3 is used in order to have quarterly relationship between economic and 
market variables. The goal of this first econometric models is to simulate joint scenarios Sy 
of the economy and the market. Each scenario is represented by the current values of x (t) 
and the future paths of x(t + h): 


Sa = {a(t +h) = (a1 (t+h),...,¢m(t+h)),h =0,1,2,...} (7.22) 


50They concern the sensitivity to markets rates, the behavior of customers, the new production, the 
interest rate shocks, etc. 
51 They are developed in Section 10.2.3 on page 655. 
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These scenarios do not necessarily correspond to extreme shocks, but they model the prob- 
ability distribution of all future outcomes. 


The second step consists in relating the growth of the outstanding amount y; (t) of item 
i to the variables x (t). For instance, let us assume that: 


yi (t) = yi (t — 1) + 0.7 x g (t) — 0.3 x z(t) 


This means that an economic growth of 1% implies that the outstanding amount of item i 
will increase by 70 bps, while the inflation has a negative impact on y; (t). The first idea is 
then to consider an ARX (q) model: 


q m 
yi (t) = D> diay: (t — k) + YO bijz (t) + ee (t) 
k=1 j=1 


However, this type of model has two drawbacks. It assumes that the current value of y; (t) 
is related to the current value of x, (t) and there are no substitution effects between the 
different items of the balance sheet. This is why it is better to consider again a VECM 
approach with exogenous variables: 


®, (L) Ay (t) = Iyy (t — 1) + Bia (t) + BoAz (t) + £ (t) (7.23) 


where y(t) = (yı (t), -.-, Yn (t)) and ey (t) ~ N (0, £,). In this case, the current value of 
yi (t) is related to the current value of « (t), the monthly variation Az (t) and the growth of 
the outstanding amount of the other items. Generally, the number q of lags is less than p. 
Indeed, the goal of the model (7.23) is to include short-term substitution effects between the 
different items whereas long-term substitution effects are more explained by the dynamics 
of economic and market variables. 


Once the model (7.23) is estimated, we can simulate the future values of the outstanding 
amount for the different items with respect to the scenario S, of the exogenous variables: 


Sy | Sc = {y(t +h) = (yı (t+ h),...,yn (t+ h)),h = 0,1,2,...} 


This framework allows going beyond the static gap analysis of interest rates, because the 
outstanding amounts are stochastic. For example, Figure 7.14 shows an earnings-at-risk 
analysis of the net interest income for the next six months. For each month, we report the 
median of NII and the 90% confidence interval. 


Remark 83 The previous framework can be used for assessing a given scenario, for exam- 
ple a parallel shock of interest rates. By construction, it will not give the same result than 
the income gap analysis, because this latter does not take into account the feedback effects 
of interest rates on the outstanding amount. 


7.2.4 Funds transfer pricing 


According to Bessis (2015), the main objective of funds transfer pricing systems is to 
exchange funds and determine the profit allocation between business units. This means that 
all liquidity and interest rate risks are transferred to the ALM unit, which is in charge of 
managing them. Business units can then lend or borrow funding at a given internal price. 
This price is called the funds transfer price or the internal transfer rate, and is denoted by 
FTP. For example, the FTP charges interests to the business unit for client loans, whereas 
the FTP compensates the business unit for raising deposits. This implies that the balance 
sheet of the different business units is immunized to changes of market rates, and the internal 
transfer rates determine the net interest income of each business unit. 
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FIGURE 7.14: Earnings-at-risk analysis 


7.2.4.1 Net interest and commercial margins 


The net interest margin (NIM) is equal to the net interest income divided by the amount 
of assets: 


ya N; (t, u) i Ri (t, u) g yj Liabilities N; (t, u) ` Rj (t, u) 
D icAssets Ni (t, u) 
Let RA (t,u) and RL (¢, u) be the interest earning assets and interest bearing liabilities (or 


asset and liability amounts that are sensitive to interest rates). Another expression of the 
NIM is: 


NIM (t, u) = 


RA (t,u)- Rra (t,u) — RL (t, uv) - Ret (t, u) 
RA (t, u) 
where Rra and Rez represent the weighted average interest rate of interest earning assets 


and interest bearing liabilities. The net interest margin differs from the net interest spread 
(NIS), which is the difference between interest earning rates and interest bearing rates: 


eee Ni (t, u) -Ri (t, u) J ie Liabilities N; (t, u) g Rj (t, u) 


De Asses N; (t, u) je Liabilities Nj (t, u) 
= Raa (t,u) — Rri (t, u) 


NIM (t, u) = 


NIS (t,u) = 


Example 73 We consider the following interest earning and bearing items: 


Assets Ni(t,u) Rj(t,u) | Liabilities Nj (t,u) R, (t,u) 
Loans 100 5% Deposits 100 0.5% 
Mortgages 100 4% Debts 60 2.5% 


The interest income is equal to 100 x 5% + 100 x 4% = 9 and the interest expense is 
100 x 0.5% + 60 x 2.5% = 2. We deduce that the net interest income is equal to 9 — 2 = 7. 
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Moreover, we obtain? RA (t, u) = 200, Rra (t,u) = 4.5%, RL (t, u) = 160 and Rri (t, u) = 
1.25%. We deduce that: 
_ 200 x 4.5% — 160 x 1.25% 7 


NIM (t,u) = ae = zg = 35% 


and: 
NIS (t, u) = 4.5% — 1.25% = 3.25% 


The net interest margin and spread are expressed in percent. NIM is the profitability ratio 
of the assets whereas NIS is the interest rate spread captured by the bank. 


Remark 84 In Figure 7.15, we have reported the average net interest margin in % for all 
US banks from 1984 to 2019. The average NIM was equal to 3.36% at the end of the first 
quarter of 2019. During the last 15 years, the average value is equal to 3.78%, the maximum 
4.91% has been reached during Q1 1994 whereas the minimum 2.95% was observed in Q1 
2015. 


5.0 p 


4.5 F 


4.0 F 


3.5 F 


NIM (in Z) 


2.5.7 


2.0 
1985 1990 1995 2000 2005 2010 2015 


Year 
FIGURE 7.15: Evolution of the net interest margin in the US 


Source: Federal Financial Institutions Examination Council (US), Net Interest Margin for all US 
Banks [USNIM], retrieved from FRED, Federal Reserve Bank of St. Louis; 
https: //fred.stlouisfed.org/series/USNIM, July 9, 2019. 


52We have: 
R auj OE ey 
oe 100 + 100 — 
and: 
100 x 0.5% + 60 x 2.5% 


Rr (t,u) = ‘Geno = 1.25% 
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Let us now see how to calculate the commercial margin rate. A first idea is to approxi- 
mate it by the net interest margin or the net interest spread. However, these quantities are 
calculated at the global level of the bank, not at the level of a business unit and even less 
at the level of a product. Let us consider an asset i. From a theoretical point of view, the 
commercial margin rate is the spread between the client rate of this asset R; (t, u) and the 
corresponding market rate r (t, u): 


m; (t,u) = Ri (t, u) — r (t, u) 


Here, we assume that R; (t, u) and r (t, u) have the same maturity u. If we consider a liability 
j, we obtain a similar formula: 


mj (t,u) =r (t, uv) — Rj (t, u) 


In this framework, we assume that the business unit borrows at the market rate r(t, u) 
in order to finance the asset i or lends to the market at the same rate r(t,u). A positive 
commercial margin rate implies that R; (t,u) > r(t,u) and r (t,u) > Rj (t,u). In the case 
where we can perfectly match the asset 7 with the liability j, the commercial margin rate 
is the net interest spread: 


m(t,u) = mj (t,u) +m; (t,u) 
= R,(t,u) — R; (t,u) 
As already said, a funds transfer pricing system is equivalent to interpose the ALM unit be- 


tween the business unit and the market. In the case of assets, we decompose the commercial 
margin rate of the bank as follows: 


Mi (t, u) = khi (t, u) =e (t, u) 
(R; (t, u) — FTP; (t, u)) + (FTP; (t,u) — r (t,u)) 
aaa C 
m (t,u) m® (tu) 


where m (t, u) and mi) (t, u) are the commercial margin rate of the business unit and the 


transformation margin rate of the ALM unit. For liabilities, we also have: 
È t 
m| ) (t,u) + mf ) (t, u) 


(FTP; (t, u) — R; (t,u)) + (r (t, u) — FTP; (t,u)) 


II 


Mj (t, u) 


The goal of FTP is then to lock the commercial margin rate ml® (t, u) (or my (t, u)) over 
the lifetime of the product contract. 


Let us consider Example 73. The FTP for the loans and the mortgages is equal to 3%, 
while the FTP for deposits is equal to 1.5% and the FTP for debts is equal to 2.5%. If we 
assume that the market rate is equal to 2.5%, we obtain the following results: 


Assets m{® (t,u) m (t,u) | Liabilities mẹ (t,u) mẹ (tu) 


i 
Loans 2% 0.5% Deposits 1.0% 1.0% 
Mortgages 1% 0.5% Debts 0.0% 0.0% 


It follows that the commercial margin of the bank is equal to: 


M‘ 100 x 2% + 100 x 1% + 100 x 1% + 60 x 0% 


= 4 


II 
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For the transformation margin, we have: 


M® = 100 x 0.5% +100 x 0.5% + 100 x 1.0% + 60 x 0% 
2.0 


We don’t have M( + M® = NII because assets and liabilities are not compensated: 


NII— (uo i: MO) = (RA(t,u)— RL (t,u)) -r (t,u) 
= 40x 2.5% 
1 


In fact, in a funds transfer pricing system, the balance sheet issue is the problem of the ALM 
unit. It is also interesting to notice that we can now calculate the commercial margin of each 
product: M n, = 2, Mi cung = 1 and I = 1. We can then aggregate them by 
business units. For example, if the business unit is responsible for loans and deposits, its 


commercial margin is equal to 3. 


7.2.4.2 Computing the internal transfer rates 


Since the business unit knows the internal prices of funding, the commercial margin 
rates are locked and the commercial margin has a smooth profile. The business unit can 
then focus on its main objective, which is selling products and not losing time in managing 
interest rate and liquidity risks. However, in order to do correctly its job, the internal prices 
must be fair. The determination of FTPs is then crucial because it has a direct impact on 
the net income of the business unit. A system of arbitrary or wrong prices can lead to a false 
analysis of the income allocation, where some business units appear to be highly profitable 
when the exact opposite is true. The consequence is then a wrong allocation of resources 
and capital. 


The reference rate If we consider the transformation margin rate, we have m” (t u) = 
FTP; (t, u) — r (t,u). The internal prices are fair if the corresponding mark-to-market is 
equal to zero on average, because the goal of FTP is to smooth the net interest income of 
each business unit and to allocate efficiently the net interest income between the different 
business units. For a contract with a bullet maturity, this implies that: 


FTP; (t u) = E[r (t, u)] 


The transformation margin can then be interpreted as an interest rate swap? receiving a 
fixed leg FTP; (t, u) and paying a floating leg r (t, u). It follows that the funds transfer price 
is equal to the market swap rate at the initial date t with the same maturity than the asset 
item 7 (Demey et al., 2003). 


In practice, it is impossible to have funds transfer prices that depend on the initial 
date and the maturity of each contract. Let us first assume that the bank uses the short 
market rate r (u) for determining the funds transfer prices and considers globally the new 
production NP (t) instead of the different individual contracts. The mark-to-market of the 
transformation margin satisfies then the following equation: 


ve [ Benne (t) S (t, wu) (FTP (t,u) — r (u)) du] =0 


53In the case of liabilities, the transformation margin is an interest rate swap paying the fixed leg 
FTP; (t, uw) and receiving the floating leg r (t, u). 
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As noticed by Demey et al. (2003), we need another constraint to determine explicitly the 
internal transfer rate, because the previous equation is not sufficient. For instance, if we 
assume that the internal transfer rate is constant over the lifetime of the new production 
— FTP (t, u) = FTP (t), we obtain: 


a [S B (t,u)S (t,u) r (u) du] 


t 


PEP U= B (tu) S (tu) dal 


The drawback of this approach is that the commercial margin is not locked, and the business 
unit is exposed to the interest rate risk. On the contrary, we can assume that the commercial 
margin rate of the business unit is constant: 


R (u) — FTP (t,u) =m 


Demey et al. (2003) show that*: 


FTP (t,u) = R (u) + 


The term structure of funds transfer prices According to Bessis (2015), there are 
two main approaches for designing a funds transfer pricing system: cash netting and central 
cash pool systems. In the first case, the business unit transfers to the ALM unit only the net 
cash balance, meaning that the internal transfer rates apply only to a fraction of asset and 
liability items. This system presents a major drawback, because business units are exposed 
to interest rate and liquidity risks. On the contrary, all funding and investment items are 
transferred into the ALM book in the second approach. In this case, all items have their 
own internal transfer rate. In order to reduce the complexity of the FTP system, assets 
and liabilities are generally classified into homogeneous pools in terms of maturity, credit, 
etc. In this approach, each pool has its own FTP. For example, the reference rate of long 
maturity pools is a long-term market rate while the reference rate of short maturity pools 
is a short-term market rate. In Figure 7.16, we have represented the term structure of the 
FTPs. Previously, we have seen that the reference rate is the market swap rate, meaning 
that the reference curve is the IRS curve. In practice, the FTP curve will differ from the 
IRS curve for several reasons. For instance, the reference curve can be adjusted by adding a 
credit spread in order to reflect the credit-worthiness of the bank, a bid-ask spread in order 
to distinguish assets and liabilities, a behavior-based spread because of prepayment and 
embedded options, and a liquidity spread. Therefore, we can decompose the funds transfer 
price as follows: 


FTP (t, u) = FTP™ (t, u) + FTPHWdtY (¢, u) + PTPO™ (t u) 


where FTP™ (t, u) is the interest rate component, FTP™ (t, u) is the liquidity component 
and FTP°* (t u) corresponds to the other components. The FTP curve can then be 
different than the IRS curve for the reasons presented above. But it can also be different 
because of business or ALM decisions. For instance, if the bank would like to increase its 
mortgage market share, it can reduce the client rate R; (t, u) meaning that the commercial 


54Using this formulation, we can show the following results: 


e for a loan with a fixed rate, the funds transfer price is exactly the swap rate with the same maturity 
than the loan and the same amortization scheme than the new production; 


e if the client rate R (u) is equal to the short-term market rate r (u), the funds transfer price FTP (t, u) 
is also equal to r (u). 
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margin mO (t, u) decreases, or it can maintain the commercial margin by reducing the 
internal transfer rate FTP; (t,u). Another example concerns the investment maturity of 
retail deposits. Each time this maturity is revisited, it has a big impact on the retail business 
unit because a shorter maturity will reduce the internal transfer price and a longer maturity 
will increase the internal transfer price. Therefore, the FTP of deposits highly impacts the 
profitability of the retail business unit. 
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FIGURE 7.16: The term structure of FTP rates 


7.3 Behavioral options 


In this section, we focus on three behavioral options that make it difficult to calculate 
liquidity and interest rate risks. They have been clearly identified by the BCBS (2016d) 
and concern non-maturity deposits, prepayment risk and redemption (or early termination) 
issues. For NMDs, the challenge is to model the deposit volume and the associated implicit 
duration. For the two other risks, the goal is to calculate prepayment rates and redemption 
ratios on a yearly basis. 


7.3.1 Non-maturity deposits 


Let us assume that the deposit balance of the client A is equal to $500. In this case, 
we can assume that the duration of this deposit is equal to zero day, because the client 
could withdraw her deposit volume today. Let us now consider 1 000 clients, whose deposit 
balance is equal to $500. On average, we observe that the probability to withdraw $500 at 
once is equal to 50%. The total amount that may be withdrawn today is then between $0 
and $500000. However, it is absurd to think that the duration of deposits is equal to zero, 
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because the probability that $500000 are withdrawn is less than 1073°°%! Since we have 
Pr {S > 275000} < 0.1%, we can decide that 55% of the deposit balance has a duration 
of zero day, 24.75% has a duration of one day, 11.14% has a duration of two days, etc. It 
follows that the duration of deposits depends on the average behavior of customers and the 
number of account holders, but many other parameters may have an impact on non-maturity 
deposits. From a contractual point of view, deposits have a very short-term duration. From 
a statistical point of view, we notice that a part of these deposits are in fact very stable 
because of the law of large numbers. 


NMDs are certainly the balance sheet item that is the most difficult to model. There 
are multiple reasons. The first reason is the non-specification of a maturity in the contract. 
The second reason is that NMDs are the most liquid instruments and their transaction 
costs are equal to zero, implying that subscriptions and redemptions are very frequent. 
This explains that the volume of deposits is the most volatile among the different banking 
products at the individual level. Another reason is the large number of embedded options 
that creates significant gamma and vega option risks (Bléchlinger, 2015). Finally, the volume 
of NMDs is very sensitive to the monetary policy (Bank of Japan, 2014), because NMDs 
are part of the M1 money supply, but also of the M2 money supply. Indeed, NMDs is 
made up of demand deposits (including overnight deposits and checkable accounts) and 
savings accounts. M1 captures demand deposits (and also currency in circulation) while 
M2 — M1 is an approximation of savings accounts. In what follows, we do not make a 
distinction between NMDs, but it is obvious that the bank must distinguish demand deposits 
and savings accounts in practice. Generally, academics model behavioral options related to 
NMDs by analyzing substitution effects between NMDs and term deposits. In the real life, 
demand-side substitution is more complex since it also concerns the cross-effects between 
demand deposits and savings accounts. 


7.3.1.1 Static and dynamic modeling 


In the case of non-maturity deposits, it is impossible to make the distinction between 
the entry dates. This means that the stock amortization function S* (t, u) must be equal to 
the amortization function S (t, u) of the new production. This implies that the hazard rate 
A(t, u) of the amortization function S (t, u) does not depend on the entry date t: 


X(t, u) = à (u) 


Indeed, we have by definition: 


S (t, u) = exp (- fao as) 


and we verify that: 


= S(t, u) 


According to Demey et al. (2003), the concept of new production has no meaning. Then, 
we must focus on the modeling of the current volume of NMDs, which is given by Equation 


55This result is based on the following computation: 


u t u 
E NP (s) E J A (v) dv ds 7 de NP (s) Pa (J A(v) dv+ f A(v) dv) ds _- J Made 


t t 
fon NP (s)e J Aae a f? NP (s)e7 d. AES ay 
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(7.7) on page 386: 


It follows that: 


or: 


dN (t) = (NP (t) — A(t) N (t)) dt (7.24) 


Therefore, the variation of N (t) is the difference between deposit inflows NP (t) and deposit 
outflows À (t) N (t). In the case where the new production and the hazard rate are constant 
- NP(t) = NP and X(t) = A, we obtain®® N (t) = No + (No — Noo) e>*¢-*) where 
No = N (to) is the current value and Nj, = AT! NP is the long-term value of N (t). In this 
case, Equation (7.24) becomes: 


dN (t) = àA (Næ — N (t)) dt (7.25) 
We recognize the deterministic part of the Ornstein-Uhlenbeck process: 
dN (t) = A (Næ — N (t)) dt + 0 dW (t) (7.26) 
where W (t) is a Brownian motion. In this case, the solution is given by°’: 
t 
N (t) = Noe >to) + No (1 — aain) + o f e7>t-s) dW (s) (7.27) 
to 


The estimation of the parameters (A, Næ, g) can be done using the generalized method of 
moments (GMM) or the method of maximum likelihood (ML). In this case, we can show 
that: 


NIN (s) =N, ~N CELE 


where: 
HH) = Noe OO + Noo (1 — eM?) 


ie e72A(t-s) 
2 at ae 
Ost) =F ( 2A ) 


Example 74 We consider a deposit account with the following characteristics: No = 
$1000, À = 10 and o = $1000. 


and: 


The frequency A means that the average duration of the deposit balance is equal to 
1/X. In our case, we find 1/10 = 0.1 years or 1.2 months. The new production is NP = 
AN = $10000 . This new production can be interpreted as the annual income of the client 


56The solution of Equation (7.24) is given by: 


w- SE = (m — NP) one 


57See Appendix A.3.8.2 on page 1075. 
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that is funded the deposit account. In Figure 7.17, the top panel represents the expected 


value j1(9,4) of the deposit balance by considering different current values No, the top left 


panel corresponds to the density function”? f(, 4) (x) of N (t) given that N (s) = Ns and 
the bottom panel shows three simulations of the stochastic process N (t). 
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FIGURE 7.17: Statistics of the deposit amount N (t) 


Another extension of Model (7.25) is to make the distinction between stable and non- 


stable deposits. Let g be the growth rate of deposits. The total amount of deposits D (t) is 
given by: 


D (t) = e0 X N, (t) 
i=] 


where n; is the number of deposit accounts and N; (t) is the deposit balance of the it! 
deposit account. It follows that: 


net 


D(t) = eg (ts) V Noo ,i + e9(t-s) 5 (Nee Nooi) en i(t—s) 4 
i=1 i=1 


Nt 1 a e7 2ài (t—s) 
g(t—s) ; (t 
e >, Gi Dn ci (t) 


where c; (t) ~ N (0,1). By considering a representative agent, we can replace the previous 


equation by the following expression: 


D(H = Dye) + (Ds — Dæ) e0750- + e (t) (7.28) 


2 
= 1 1 f ©— H(s,t) 
F(s,t) (x) (e,t) or exp ( 2 ( P ) ) 


58We have: 
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where Dæ = Soy, Noo,i, Ds = 072, Nsa, A7! is the weighted average duration of deposits 
and e (t) is the stochastic part. Demey et al. (2003) notice that we can decompose D (t) 
into two terms: 

D (t) = Diong (8,t) + Denort (8, t) 


where Diong (s, t) = De®) and Dsnort (8, t) = (Ds — Doo) e9- 9) +e (t). This break- 
down seems appealing at first sight, but it presents a major drawback. Indeed, the short 
component Dsnort (s, t) may be negative. In practice, it is better to consider the following 
equation: 


D(t)= gD t À + (D; — Dy) e8750) + e (t) + (1 — p) Dae? 
——_ aa n aaas 
Dstable(s,t) Dyon-stable($;t) 


where Dstable (S,¢) corresponds to the amount of stable deposits and Dnon—stable (5, t) = 
D (t) — Dstabie (s, t) is the non-stable deposit amount. At time t = s, we verify thatë’: 


D (t) = Degtable + Dnon—stable (t) 


The estimation of stable deposits is a two-step process. First, we estimate Da by using the 
ML method. Second, we estimate the fraction p < 1 of the long-run amount of deposits 
that can be considered as stable. Generally, we calibrate the parameter y such that pNoo 
is the quantile of D (t) at a given confidence level (e.g. 90% or 95%). 


In Figure 7.18, we assume that the deposit amount D (t) follows an Ornstein-Uhlenbeck 
process with parameters Dæ = $1 bn, A = 5 and o = $200 mn. In the top/right panel, 
we have reported the Djong/Dshort breakdown. We verify that the short component may 
be negative, meaning the long component cannot be considered as a stable part. This is 
not the case with the Dgtapie/Dnon—stable breakdown given in the bottom panels. The big 
issue is of course the estimation of the parameter y. One idea might be to calibrate y such 
that Pr {D(t) < pDx~} = 1 — a given the confidence level a. If we consider the Ornstein- 
Uhlenbeck dynamics, we obtain the following formula: 


a od '(1-a) 
ie Dœ vV2A 


In our example, this ratio is respectively equal to 85.3%, 89.6% and 91.9% when a takes 
the value 99%, 95% and 90%. 


Remark 85 We recall that the Basel Committee makes the distinction between stable and 
core deposits. It is assumed that the interest rate elasticity of NMDs is less than one. Core 
deposits are the proportion of stable deposits, whose pass through sensitivity is particularly 
low, meaning they are “unlikely to reprice even under significant changes in interest rate 
environment” (BCBS, 2016d, page 26). 


7.3.1.2 Behavioral modeling 


If we assume that the growth rate g is equal to zero, the linearization of Equation (7.28) 
corresponds to the Euler approximation of the Ornstein-Uhlenbeck process: 


D(t) ~ D(s) +À (Dæ — D(s)) + e (t) (7.29) 


59The previous results are based on the dynamic analysis between time s and t. If we prefer to adopt a 
static analysis, the amount of non-stable deposits must be defined as follows: 


Dhnon-stable (t) =D (t) = pDoo 
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FIGURE 7.18: Stable and non-stable deposits 


Here, D (t) is the value of the non-maturity account balance or deposit volume. A similar 
expression is obtained by considering the individual deposit amount N (t) instead of D (t). 
In what follows, we use the same notation D (t) for defining aggregated and individual 
deposit balances. Let us come back to the general case: dD (t) = (NP (t) — A(t) D(#)) dt. 
By assuming that the new production is a function of the current balance, we have NP (t) = 
g (t, X (t)) D (t) where g (t, X (t)) depends on a set of explanatory variables X (t). If follows 
that dlnD (t) = (g (t, X (t)) — à (t)) dt and: 


ln D (t) ~ In D (s) +g (s, X (s)) — à (s) (7.30) 


Modeling the behavior of the client and introducing embedded options can be done by 
combining Equations (7.29) and (7.30): 


In D(t) = ln D (8) + à (n Dæ — In D(s)) + g (t, X (t)) + (t) 


In this case, the main issue is to specify g(t, X (t)) and the explanatory variables that 
impact the dynamics of the deposit volume. Most of the time, g (t, X (t)) depends on two 
variables: the deposit rate i (t) and the market rate r (t). In what follows, we present several 
models that have been proposed for modeling either D (t) or i(t) or both. The two pioneer 
models are the deposit balance model of Selvaggio (1996) and the deposit rate model of 
Hutchison and Pennacchi (1996). 


The Hutchison-Pennacchi-Selvaggio framework In Selvaggio (1996), the deposit 
rate i(t) is exogenous and the bank account holder modifies his current deposit balance 
D (t) to target a level D* (t), which is defined as follows: 


In D* (t) = Bo + Bi lni (t) + 21n Y (t) 


where Y (t) is the income of the account holder. The rational of this model is the following. 
In practice, the bank account holder targets a minimum positive balance in order to meet 
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his current liquidity and consumption needs, which are a function of his income Y (t). For 
example, we can assume that the client with a monthly income of $10000 targets a larger 
amount than the client with a monthly income of $1000. Moreover, we can assume that the 
target balance depends on the deposit rate i(t). The elasticity coefficient must be positive, 
meaning that the client has a high incentive to transfer his money into a term deposit 
account if the deposit rate is low. At time t, the account holder can face two situations. 
If Diy < Df, he will certainly increase his deposit volume in order to increase his cash 
liquidity. If D;_; > Df, he will certainly transfer a part of his deposit balance into his 
term account. Therefore, the behavior of the bank account holder can be represented by a 
mean-reverting AR(1) process: 


In D(t) —InD(t—1) = (1 — 4) (n D* (t) —In D (t — 1)) + € (t) (7.31) 


where e (t) ~ N (0, a°) is a white noise process and ¢ < 1 is the mean-reverting parameter. 
It follows that: 


mD(t) = ¢?mD(t—1)+(1—¢)mD* (t) + e(t) 
= dmD(t—-1)+ 64+ 6 ni(t) + BMY (t) + e(t) (7.32) 


where 8i, = (1— ¢) By. Let d(t) = ln D (t) be the logarithm of the deposit volume. The 
model of Selvaggio (1996) is then a ARX(1) process: 


d(t) = ¢d(t —1) + (1 — $) d (t) + € (t) (7.33) 


where d* (t) = In D* (t) is the exogenous variable. 


In practice, the bank does not know the value 0 = (¢, Bo, 61, 82,0) of the parameters. 
Moreover, these parameters are customer-specific and are different from one customer to 
another. The bank can then estimate the vector 0 for a given customer if it had a sufficient 
history. For instance, we consider that a two-year dataset of monthly observations or a 
ten-year dataset of quarterly observations is generally sufficient to estimate five parameters. 
However, the variables i (t) and Y (t) rarely change, meaning that it is impossible to estimate 
0 for a given customer. Instead of using a time-series analysis, banks prefer then to consider 
a cross-section/panel analysis. Because Model (7.33) is linear, we can aggregate the behavior 
of the different customers. The average behavior of a customer is given by Equation (7.32) 
where the parameters ¢, Bo, 01, G2 and o are equal to the mean of the customer parameters. 
This approach has the advantage to be more robust in terms of statistical inference. Indeed, 
the regression is performed using a large number of observations (number of customers x 
number of time periods). 


In the previous model, the deposit interest rate is given and observed at each time 
period. Hutchison and Pennacchi (1996) propose a model for fixing the optimal value of 
i (t). They assume that the bank maximizes its profit: 


i* (t) = arg max II (t) 
where the profit II (t) is equal to the revenue minus the cost: 
I(t) =r (t)-D(t)—-(@@) +e) DE 


In this expression, r (t) is the market interest rate and c(t) is the cost of issuing deposits. 
By assuming that D (t) is an increasing function of i(t), the first-order condition is: 


CO-ED): Fr — DW =0 
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We deduce that: 


= r(t)—s(t) (7.34) 


The deposit interest rate is then equal to the market interest rate r(t) minus a spreads? 
s(t). Equations (7.32) and (7.34) are the backbone of various non-maturity deposit models. 


The IRS framework Using arbitrage theory, Jarrow and van Deventer (1998) show that 
the deposit rate must be lower than the market rate! — i (t) < r (t), and the current market 
value of deposits is the net present value of the cash flow stream D (t): 


co 


V (0) =E |X B(O,t+1)(r@ — i (t)) D (t) (7.35) 


t=0 


where B (0, t) is the discount factor. Jarrow and van Deventer (1998) interpret V (0) as an 
exotic interest rate swap, where the bank receives the market rate and pays the deposit rate. 
Since the present value of the deposit liability of the bank is equal to L (0) = D (0) — V (0), 
the hedging strategy consists in “investing D (0) dollars in the shortest term bond B (0,1) 
and shorting the exotic interest rate swap represented by V (0)” (Jarrow and van Deventer, 
1998, page 257). The complete computation of the hedging portfolio requires specifying i (t) 
and D (t). For example, Jarrow and van Deventer (1998) consider the following specification: 


In D(t) =1n D (t — 1) +o + ir (t) + fa(r(t)—r(t—1))+Bst (7-36) 

and: 
i (t) = i (t) + Bo + Bir (t) + Bo (r (t) — r (t — 1)) (7.37) 
The deposit balance and the deposit rate are linear in the market rate r (t) and the variation 


of the market rate Ar (t). The authors also add a trend in Equation (7.36) in order to take 
into account macroeconomic variables that are not included in the model. 


The previous model is fully tractable in continuous-time. Beyond these analytical for- 
mulas, the main interest of the Jarrow-van Deventer model is to show that the modeling of 
non-maturity deposits is related to the modeling of interest rate swaps. Another important 
contribution of this model is the introduction of the replicating portfolio. Indeed, it is com- 
mon to break down deposits into stable and non-stable deposits, and stable deposits into 
core and non-core deposits. The idea is then to replicate the core deposits with a hedging 
portfolio with four maturities (3, 5, 7 and 10 years). In this case, the funds transfer pricing 
of non-maturity deposits is made up of four internal transfer rates corresponding to the 
maturity pillars of the replicating portfolio. 


50 We notice that the spread s (t) is the sum of the cost c (t) and the Lerner index 7 (t), where n (t) = 1/e (t) 
and e(t) is the interest rate elasticity of the demand. 

61 This inequality is obtained by assuming no arbitrage opportunities for individuals and market seg- 
mentation. In particular, Jarrow and van Deventer (1998) consider that the competition among banks is 
imperfect because of entry and mobility barriers to the banking industry. 
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Asymmetric adjustment models O’Brien (2001) introduces an asymmetric adjustment 


of the deposit rate: 
Ai(t) = a(t) (@() -i(t-1)) +n) 


where î (t) is the conditional equilibrium deposit rate and: 
a(t)=at-1{#(t) >i(t-D}t+a -1{7(t) <i(t-1)} 


If % > i(t— 1), we obtain Ai (t) = at - (î (t) —i(t—1)) +7 (t), otherwise we have Ai (t) = 
a” - (2(t) —i(t—1)) + n(t). The distinction between at and a~ can be justified by the 
asymmetric behavior of banks and the rigidity of deposit rates. In particular, O’Brien (2001) 
suggests that a~ > at, implying that banks adjust more easily the deposit rate when the 
market rate decreases than when it increases. In this model, the deposit balance is a function 
of the spread r (t) — i (t): 


In D (t) = Bo + bı ln D (t — 1) + Bo (r (t) —i(t)) + 8 mnY (t) + €(t) 


Moreover, O’Brien (2001) assumes that the conditional equilibrium deposit rate is a linear 
function of the market rate: 


a(t) = +n -7 (t) 

In the previous model, the asymmetric adjustment explicitly concerns the deposit in- 
terest rate i(t) and implicitly impacts the deposit balance D (t) because of the spread 
r(t) — i (t). Frachot (2001) considers an extension of the Selvaggio model by adding a cor- 
rection term that depends on the market interest rate r (t) and a threshold: 


In D(t) -In D(t—1) = (1 — 4) (In D* (t) — In Di1) + ôe (r (t) ,r*) (7.38) 


where ôe (r (t), r*) = 6-1 {r (t) < r*} and r* is the interest rate floor. When market interest 
rates are too low and below r*, the bank account holder does not make the distinction 
between deposit and term balances, and we have: 


ô ifr(t)<r* 
x) a 
dan = { 0 otherwise 
Contrary to the Selvaggio model, the average behavior is not given by Equation (7.38) be- 
cause of the non-linearity pattern. Let f be the probability density function of the threshold 
r* among the different customers of the bank. On average, we have: 


II 


i [ős (r (£), r”) [br safe) ae 
5-(1—-F(r(t))) 


The average behavior is then given by the following equation: 


(1— 4) (d* (t) -—d(t—-1)) + 8 (1 — F (r (t))) 


= 
where d (t) = In D (t) and d* (t) = In D* (t). For example, if we assume that the distribution 
of r* is uniform on the range [0; rž ax], we obtain f (x£) = 1/r%,,, and F (x) = min (a/r*,,,, 1). 
We deduce that: 


d(t)—d(t-1) = (1-4) (d*(t)—d(t-1)) a(i min (Z1) 


max 


II 


d(t)-d(t-1 
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In the case where r* ~ N (1,02), we obtain: 


a(t) d(t~1) = (= (a () a(t 1)) + 50 (HO) 


* 


Another asymmetric model was proposed by OTS (2001): 


d(t)=d(t—1)+Aln (4 + 8, arctan (o + ba | + Bai w) +e(t) 

where A corresponds to the frequency. The ‘Net Portfolio Value Model published by the 
Office of Thrift Supervision®? is a comprehensive report that contains dozens of models in 
order to implement risk management and ALM policies. For instance, Chapter 6 describes 
the methodologies for modeling liabilities and Section 6.D is dedicated to demand deposits. 
These models were very popular in the US in the 1990s. In 2011, the Office of the Comptroller 
of the Currency (OCC) provided the following parameters for the monthly model®? of 
transaction accounts: Bo = 0.773, 6; = —0.065, G2 = —5.959, 63 = 0.997 and 64 = 1 
bp. In the case of money market accounts, the parameters were Bo = 0.643, 6, = —0.069, 
Bo = —6.284, 83 = 2.011 and 4 = 1 bp. 


O’Brien model 


— i(i) 
-=i 
== i(t) 


oul 
NUU 
NNN 


Frachot model 


FIGURE 7.19: Impact of the market rate on the growth rate of deposits 


In Figure 7.19, we compare the growth rate g (t) of deposits for the different asymmetric 
models. For the O’Brien model, the growth rate is equal to g (t) = Bo (r (t) — i (t)). In the 
case of the Frachot model, the market rate has only a positive impact because ĝe (r (t) ,r*) > 
0. This is why we consider an extended version where the correction term is equal to 


62The mission of OTS is to “supervise savings associations and their holding companies in order to 
maintain their safety and soundness and compliance with consumer laws and to encourage a competitive 
industry that meets America’s financial services needs”. 

63We have A = 1/12. 
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ôe (r (t), r*)— 6—. The growth rate is then g (t) = 6 (1 — F (r (t))) — 6. Finally, the growth 
rate of the OTS model is equal to g (t) = In (Bo + 8, arctan (a + Bs 1) + Bai ®). Using 
several value of the deposit rate i(t), we measure the impact of the market rate r(t) on 
the growth rate g (t) using the following parameters: G2 = —4 (O’Brien model), 6 = 30%, 
by = 5%, o, = 1% and 6~ = 10% (Frachot model), and Bo = 1.02, 6, = 0.2, B2 = —7, 
83 = 5 and 64 = 0 (OTS model). The O’Brien model is linear while the Frachot model is 
non-linear. However, the Frachot model does not depend on the level of the deposit rate 
i (t). The OTS model combines non-linear effects and the dependence on the deposit rate. 


Remark 86 These different models have been extended in order to take into account other 
explanatory variables such that the CDS of the bank, the inflation rate, the deposit rate 
competition, lagged effects, etc. We can then use standard econometric and time-series tools 
for estimating the unknown parameters. 


7.3.2 Prepayment risk 


A prepayment is the settlement of a debt or the partial repayment of its outstanding 
amount before its maturity date. It is an important risk for the ALM of a bank, because 
it highly impacts the net interest income and the efficiency of the hedging portfolio. For 
example, suppose that the bank has financed a 10-year mortgage paying 5% through a 
10-year bond paying 4%. The margin on this mortgage is equal to 1%. Five years later, 
the borrower prepays the mortgage because of a fall in interest rates. In this case, the 
bank receives the cash of the mortgage refund whereas it continues to pay a coupon of 4%. 
Certainly, the cash will yield a lower return than previously, implying that the margin is 
reduced and may become negative. 


Prepayment risk shares some common features with default risk. Indeed, the prepayment 
time can be seen as a stopping time exactly like the default time for credit risk. Prepayment 
and default are then the two actions that may terminate the loan contract. This is why 
they have been studied together in some research. However, they also present some strong 
differences. In the case of the default risk, the income of the bank is reduced because both 
interest and capital payments are shut down. In the case of the prepayment risk, the bank 
recovers the capital completely, but no longer receives the interest due. Moreover, while 
default risk increases when the economic environment is bad or interest rates are high, 
prepayment risk is more pronounced in a period of falling interest rates. 

In the 1980s, prepayment has been extensively studied in the case of RMBS. The big 
issue was to develop a pricing model for GNMA“ mortgage-backed pass-through securities 
(Dunn and McConnell, 1981; Brennan and Schwartz, 1985; Schwartz and Torous, 1989). 
In these approaches, the prepayment option is assimilated to an American call option and 
the objective of the borrower is to exercise the option when it has the largest value®° 
(Schwartz and Torous, 1992). However, Deng et al. (2000) show that “there exists significant 
heterogeneity among mortgage borrowers and ignoring this heterogeneity results in serious 
errors in estimating the prepayment behavior of homeowners”. Therefore, it is extremely 
difficult to model the prepayment behavior, because it is not always a rational decision 
and many factors affect prepayment decisions (Keys et al., 2016; Chernov et al., 2017). 
This microeconomic approach is challenged by a macroeconomic approach, whose goal is to 
model the prepayment rate at the portfolio level and not the prepayment time at the loan 
level. 


64The Government National Mortgage Association (GNMA or Ginnie Mae) has already been presented 
on page 139. 
65This implies that the call option is in the money. 
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In what follows, we focus on mortgage loans, because it is the main component of pre- 
payment risk. However, the analysis can be extended to other loans, for example consumer 
credit, student loans and leasing contracts. The case of student loans is very interesting 
since students are looking forward to repay their loan as soon as possible once they have 
found a job and make enough money. 


7.3.2.1 Factors of prepayment 


Following Hayre et al. (2000), prepayments are caused by two main factors: refinancing 
and housing turnover. Let 79 be the original interest rate of the mortgage or the loan. We 
note i(t) the interest rate of the same mortgage if the household would finance it at time 
t. It is clear that the prepayment time 7 depends on the interest rate differential, and 
we can assume that the prepayment probability is an increasing function of the difference 
Ai (t) = i9 — i(t): 

P(t) = Pr{r <t} = (io — i (t)) 


where 0,0 (x) > 0. For instance, if the original mortgage interest rate is equal to 10% and 
the current mortgage interest rate is equal to 0%, nobody benefits from keeping the original 
mortgage, and it is preferable to fully refinance the mortgage. This situation is particularly 
true in a period of falling interest rates. The real life example provided by Keys et al. (2016) 
demonstrates the strong implication that a prepayment may have on household budgeting: 


“A household with a 30-year fixed-rate mortgage of $200000 at an interest 
rate of 6.0% that refinances when rates fall to 4.5% (approximately the average 
rate decrease between 2008 and 2010 in the US) saves more than $60000 in 
interest payments over the life of the loan, even after accounting for refinance 
transaction costs. Further, when mortgage rates reached all-time lows in late 
2012, with rates of roughly 3.35% prevailing for three straight months, this 
household with a contract rate of 6.5% would save roughly $130 000 over the life 
of the loan by refinancing” (Keys et al., 2016, pages 482-483). 


As already said, the prepayment value is the premium of an American call option, mean- 
ing that we can derive the optimal option exercise. In this case, the prepayment strategy 
can be viewed as an arbitrage strategy between the market interest rate and the cost of 
refinancing. In practice, we observe that the prepayment probability P (t) depends on other 
factors: loan type, loan age, loan balance, monthly coupon (Elie et al., 2002). For example, it 
is widely accepted that the prepayment probability is an increasing function of the monthly 
coupon. 


The second factor for explaining prepayments is housing turnover. In this case, the 
prepayment decision is not motivated by refinancing, but it is explained by the home sale due 
to life events. For instance, marriage, divorce, death, children leaving home or changing jobs 
explain a large part of prepayment rates. Another reason is the housing market dynamics, 
in particular home prices that have an impact on housing turnover. These different factors 
explain that we also observe prepayments even when interest rates increase. For example, the 
upgrading housing decision (i.e. enhancing the capacity or improving the quality of housing) 
is generally explained by the birth of a new child, an inheritance or a salary increase. 


Remark 87 In addition to these two main factors, we also observe that some borrowers 
choose to reduce their debt even if it is not an optimal decision. When they have some 
financial saving, which may be explained by an inheritance for example, they proceed to 
partial prepayments. 
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7.3.2.2 Structural models 


As with the credit risk, there are two families of prepayment models. The objective of 
structural models is to explain the prepayment time 7 of a borrower while reduced-form 
models are interested in the prepayment rate of a loan portfolio. 


Value of the American option The objective is to find the optimal value 7 such that 
the borrower minimizes the paid cash flows or maximizes the prepayment option. Let us 
consider a mortgage, whose maturity is equal to T. In continuous-time, the risk-neutral 
value of cash flows is equal to°: 


T u T 
V (t) = inf E? / mye de "OP dy te LPO Or) F, (7.39) 
Š t 

where m (t) and M (t) are the coupon and the mark-to-market value of the mortgage at 
time t. The first term that makes up V (t) is the discounted value of the interest paid until 
the prepayment time 7 whereas the second term is the discounted value of the mortgage 
value at the prepayment time T. Equation (7.39) is a generalization of the net present value 
of a mortgage in continuous-time”. The computation of the optimal stopping time can be 
done in a Hamilton-Jacobi-Bellman (HJB) framework. We introduce the state variable X+, 
which follows a diffusion process: 


dX (t) = u (t, X (t)) dt + o (t, X (t)) aW (t) 


We note V (t, X) the value of V (t) when X (t) is equal to X. In the absence of prepayment, 
we deduce that the value of V (t, X) satisfies the following Cauchy problem®’: 


{ —O,V (t, X) +r (t) V (t, X) = AV (t, X) + m(t) 
V (T,X) = M(T) 


where A; is the infinitesimal generator of the diffusion process: 


3V (t, X) 
ðX? 


32V (t, X) 


t, X 
u(t, X) 722 


AV (t, X) = 7 (t, X) 


The prepayment event changes the previous problem since we must verify that the value 
V (t, X) is lower than the mortgage value M (t) minus the refinancing cost C (t). The option 
problem is then equivalent to solve the HJB equation or the variational inequality: 


min (LV (t, X), V (t,X) + C(t) — M(t)) =0 


where: 


LV (t, X) = AV (t, X) +m (t) + &V (t, X) -r (t) V(t, X) 


This model can be extended to the case where there are several state variables or there is 
no maturity (perpetual mortgage). 


667 (t) is the discount rate. 
67The net present value is equal to: 


T u 4 P 7 
V (t) =E? f mage dh A mpe h ONTA 
t 


where N (T) is the outstanding amount at the maturity. 
68We use the Feynmac-Kac representation given on page 1070. 
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The Agarwal-Driscoll-Laibson model There are several possible specifications de- 
pending on the choice of the state variables, the dynamics of interest rates, etc. For exam- 
ple, using a similar framework than previously, Agarwal et al. (2013) propose the following 
optimal refinancing rule: 


1 

aT ee ee 

io —i(t) > 6 T 

where W (x) is the Lambert W function®?, y = o™t,/2 (r + À) and ¢ = 1+4 (r + à) (C/M). 

The parameters are the real discount rate r, the rate À of exogenous mortgage prepayment, 

the volatility o of the mortgage rate i (t), the refinancing cost C and the remaining mortgage 

value M. Equation (7.40) has been obtained by solving the HJB equation and assuming 
that dX (t) = o dW (t) and X (t) = i (t) — io. 

Using the numerical values r = 5%, A = 10%, o = 2%, and C/M = 1%, 6* is equal to 110 
bps. This means that the borrower has to prepay his mortgage if the mortgage rate falls by 
at least 110 bps. In Table 7.20, we consider the impact of one parameter by considering the 
other parameters unchanged. First, we assume that the cost function is C = 2000+ 1% x M, 
meaning that there is a fixed cost of $2 000. It follows that 6* is a decreasing function of the 
mortgage value M, because fixed costs penalize low mortgage values. We also verify that 6* 
is an increasing function of r, o and X. In particular, the parameter ø has a big influence, 
because it indicates if the mortgage rate is volatile or not. In the case of a high volatility, 
it may be optimal that the borrower is waiting that i(t) highly decreases. This is why the 
HJB equation finds a high value of 6*. 


(¢+ W (-e-?)) (7.40) 


TABLE 7.20: Optimal refinancing rule 6* 


M (in KUSD) 6* |, r o* | a oe) À o* 
10 612 i 1% 101 1% 79 2% 89 

100 1988, 2% 103, 2% 110, 5% 98 

250 150 5% 110 ; 3% 133 10% 110 

500 131, 8% 116, 5% 171,15% 120 

1000 121 ' 10% 120 ' 10% 239! 20% 128 


7.3.2.3 Reduced-form models 


Rate, coupon or maturity incentive? The previous approach can only be applied to 
the refinancing decision, but it cannot deal with all types of prepayment. Moreover, there is 
no guarantee that the right decision variable is the difference ig — i (t) between the current 
mortgage rate and the initial mortgage rate. For instance, io — i (t) = 1% implies a high 
impact for a 20-year remaining maturity, but has a small effect when the maturity is less 
than one year. A better decision variable is the coupon or annuity paid by the borrower. In 
the case of a constant payment mortgage, we recall that the annuity is equal to: 


a 


oy Tea 


No 


where No is the notional of the mortgage, 7 is the mortgage rate and n is the number of 
periods. If the mortgage rate drops from ig to i (t), the absolute difference of the annuity is 
equal to D4 (io, i (t)) = A (io, n) — A (i (t), n), whereas the relative difference of the annuity 


69The Lambert W function is related to Shannon’s entropy and satisfies W (x) eW() =p. 
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is given by: 


Dpr (io i(t)) = Diii 


-a fiato” )\ ie 
1—(1+ilt)) j to 
where n is the remaining number of periods. In a similar way, the relative cumulative 
difference € (io, i (t)) is equal to: 


Xi Da (io, i (t)) 
No 


€ (io, i (¢)) 


II 


( io i(t) ) 

n 

1—(1+i) ” 1—(1+i(t))” 

Finally, another interesting measure is the minimum number of periods N (io, i (t)) such 
that the new annuity is greater than or equal to the initial annuity: 


N (io,i(¢)) = {x EN: A(i(t), x) > Alit), n), AGO), 2+1) <A), n)} 
where N (io, i (t)) measures the maturity reduction of the loan by assuming that the borrower 


continues to pay the same annuity. 


TABLE 7.21: Impact of a new mortgage rate (100 KUSD, 5%, 10-year) 


1 A Da (in $) Dr € N 
(in %) (in$) Monthly Annually (in%) (in %) (in years) 

5.0 1061 

4.5 1036 24 291 2.3 2.9 9.67 
4.0 1012 48 578 4.5 5.8 9.42 
3.5 989 72 862 6.8 8.6 9.17 
3.0 966 95 1141 9.0 11.4 8.92 
2.5 943 118 1415 11.1 14.2 8.75 
2.0 920 141 1686 13.2 16.9 8.50 
1.5 898 163 1953 15.3 19.5 8.33 
1.0 876 185 2215 17.4 22.2 8.17 
0.5 855 206 2474 19.4 24.7 8.00 


Let us illustrate the impact of a new rate i(t) on an existing mortgage. We assume 
that the current outstanding amount is equal to $100000 and the amortization scheme is 
monthly. In Table 7.21, we show how the monthly annuity changes if the original rate is 5% 
and the remaining maturity is ten years. If the borrower refinances the mortgage at 2%, the 
monthly annuity is reduced by $141, which represents 13.2% of the current monthly coupon. 
His total gain is then equal to 16.9% of the outstanding amount. If the borrower prefers 
to reduce the maturity and takes the annuity constant, he will gain 18 months. In Tables 
7.22 and 7.23, we compute the same statistics when the remaining maturity is twenty years 
or the original rate is 10%. Banks have already experienced this kind of situation these 
last 30 years. For example, we report the average rate of 30-year and 15-year fixed rate 
mortgages in the US in Figure 7.20. We also calculate the differential rate between the 
30-year mortgage rate lagged 15 years and the 15-year mortgage rate. We notice that this 
refinancing opportunity has reached 10% and more in the 1990s, and was above 3% most 
of the times these last 25 years. Of course, this situation is exceptional and explained by 30 
years of falling interest rates. 
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TABLE 7.22: Impact of a new mortgage rate (100 KUSD, 5%, 20-year) 


a A Da (in $) Dr € N 
(in %) (in$) Monthly Annually (in%) (in %) (in years) 

5.0 660 

4.5 633 27 328 4.1 6.6 18.67 
4.0 606 54 648 8.2 13.0 17.58 
3.5 580 80 960 12.1 19.2 16.67 
3.0 555 105 1264 16.0 25.3 15.83 
2.5 530 130 1561 19.7 31.2 15.17 
2.0 506 154 1849 23.3 37.0 14.50 
1.5 483 177 2129 26.9 42.6 14.00 
1.0 460 200 2401 30.3 48.0 13.50 
0.5 438 222 2664 33.6 53.3 13.00 


TABLE 7.23: Impact of a new mortgage rate (100 KUSD, 10%, 10-year) 


i A Da (in $) DR € N 
(in %) (in$) Monthly Annually (in%) (in %) (in years) 
10.0 1322 

9.0 1267 55 657 4.1 6.6 9.33 
8.0 1213 108 1299 8.2 13.0 8.75 
7.0 1161 160 1925 12.1 19.3 8.33 
6.0 1110 211 2536 16.0 25.4 7.92 
5.0 1061 261 3130 19.7 31.3 7.58 
4.0 1012 309 3709 23.3 37.1 7.25 
3.0 966 356 4271 26.9 42.7 6.92 
2.0 920 401 4816 30.4 48.2 6.67 
1.0 876 445 5346 33.7 53.5 6.50 


Survival function with prepayment risk Previously, we have defined the amortiza- 
tion function S(t, u) as the fraction of the new production at time t that still remains in 
the balance sheet at time u > t: NP (t,u) = NP (t)S(t,u). We have seen that S (t, u) cor- 
responds to a survival function. Therefore, we can use the property that the product of ns 
survival functions is a survival function, meaning that we can decompose S (t, u) as follows: 


This implies that the hazard rate is an additive function: 
Ns 
A(t, u) = yoy (t, u) 
j=l 
because we have: 


ee fe A(t,s) ds = Il ES J” Aj (t,s) ds => Ae Ose, dj (t,s)) ds 
j=l 


If we apply this result to prepayment, we have: 


S (t,u) = Se (t, u) - Sp (t, u) 
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20 10 
16 8 
12 6 
8 4 
4 2 
0 , , , , , 0 , l , , , i 
1970 1980 1990 2000 2010 2020 1995 2000 2005 2010 2015 2020 


Ai(t) = izoy(t-15Y)—i1sy(t) 
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FIGURE 7.20: Evolution of 30-year and 15-year mortgage rates in the US 


Source: Freddie Mac, 30Y/15Y Fixed Rate Mortgage Average in the United States 
[MORTGAGE30US/15US], retrieved from FRED, Federal Reserve Bank of St. Louis; 
https: //fred.stlouisfed.org/series/MORTGAGE30US, July 24, 2019. 


where Se (t, u) is the traditional amortization function (or the contract-based survival func- 
tion) and S, (t, u) is the prepayment-based survival function. 


Example 75 We consider a constant amortization mortgage (CAM) and assume that the 
prepayment-based hazard rate is constant and equal to Xp. 


In Exercise 7.4.3 on page 450, we show that the survival function is equal to: 


1 e—i(t+m—u) 


Se (tu) =1{t<u<t+m}- 


1—e im 
It follows that: 
ln Se (t, 
bins _ 9 ln, (t, u) 
Ou 
E ð ln (1 — ee) ô ln (1 — et) 
7 Ou Ou 


je—t(t+m—u) 
_ e7t(t+m—u) 


V 
ei(tt+m—u) zi 


Finally, we deduce that: 
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In Figure 7.21, we report the survival function S (t, u) and the hazard rate À (t, u) of a 30-year 
mortgage at 5%. We also compare the amortization function S (t, u) obtained in continuous- 
time with the function calculated when we assume that the coupon is paid monthly. We 
notice that the continuous-time model is a good approximation of the discrete-time model. 


Monthly annuity Continuous—time annuity 


S(t,u) (in Z) 
S(t,u) (in Z) 


0 5 10 15 20 25 30 0 5 10 15 20 25 30 


Time t (in years) Time t (in years) 
Hazard rate 
100 a - x 
=-=) = 57 
80 = = 107 
=) = 207% 


fon) 
© 


A 
oS 


A(t,u) (in Z) 


0 5 10 15 20 25 30 
Time t (in years) 


FIGURE 7.21: Survival function in the case of prepayment 


Specification of the hazard function It is unrealistic to assume that the hazard func- 
tion Ap (t,u) is constant because we do not make the distinction between economic and 
structural prepayments. In fact, it is better to decompose Sp (t, u) into the product of two 


survival functions: 
Sp (t; u) = Srefinancing (t, u) i Sturnover (t, u) 


where Srefinancing (t, u) corresponds to economic prepayments due to refinancing decisions 
and Sturnover (t, u) corresponds to structural prepayments because of housing turnover. In 
this case, we can assume that Aturnover (t, u) is constant and corresponds to the housing 
turnover rate. The specification of Arefinancing (t, u) is more complicated since it depends on 
several factors. For instance, Elie et al. (2002) show that Arcfinancing (t, u) depends on the 
loan characteristics (type, age and balance), the cost of refinancing and the market rates. 
Moreover, they observe a seasonality in prepayment rates, which differs with respect to the 
loan type (monthly, quarterly or semi-annually). 

As for deposit balances, the ‘Net Portfolio Value Model’ published by the Office of Thrift 
Supervision (2001) gives very precise formulas for measuring prepayment. They assume that 
the prepayment rate is made up of three factors: 


Àp (t, u) = Aage (u = t) . seasonality (u) y Arate (u) 


where Aage measures the impact of the loan age, Aseasonality corresponds to the seasonality 
factor and A;ate represents the influence of market rates. The first two components are 
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specified as follows: 
_ J O4-age if age < 2.5 
Aage (age) = { 1 if age > 2.5 


and: 


12 th (u) — 
Aseasonality (w) = 1] + 0.20 x sin (1.37 x ( tmon (u) *) 1) 


3 


where age = u — t is the loan age and month (u) is the month of the date u. We notice 
that Aage is equal to zero for a new mortgage — u — t = 0, increases linearly with mortgage 
age and remains constant after 30 months or 2.5 years. The refinancing factor of the OTS 
model has the following expression: 


Nate (u) = Bo + B arctan ( Ba- (Ba - 5 )) 


where i (u — 0.25) is the mortgage refinancing rate (lagged three months). In Figure 7.22, 
we represent the three components’? while Figure 7.23 provides an example of the survival 
function S, (t,u) where the mortgage rate drops from 5% to 1% after 6 years. The season- 
ality component has a small impact on the survival function because it is smoothed when 
computing the cumulative hazard function. On the contrary, the age and rate components 
change the prepayment speed. 


Rage 
1.0 
0.8 
0.6 
0.4 
0.2 
0.0 
0 2 3 4 5 6 


Age (in years) 


Aseasonality 


0.9 


Jan Jul Jan Jul 
Month Current mortage rate (in 7) 


FIGURE 7.22: Components of the OTC model 


70For the specification of Arate, we use the default values of OTS (2001, Equation 5.A.7): Bo = 0.2406, 
By = —0.1389, B2 = 5.952, and B4 = 1.049. We also assume that io = 5%. 
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FIGURE 7.23: An example of survival function S, (t, u) with a mortgage rate drop 


7.3.2.4 Statistical measure of prepayment 


In fact, the OTC model doesn’t use the concept of hazard rate, but defines the constant 
prepayment rate CPR, which is the annualized rate of the single monthly mortality: 


prepayments during the month 


SMM = 
outstanding amount at the beginning of the month 


The CPR and the SMM are then related by the following equation: 
CPR = (1 — (1 — SMM) )”? 


In IRRBB, the CPR is also known as the conditional prepayment rate. It measures prepay- 
ments as a percentage of the current outstanding balance for the next year. By definition, 
it is related to the hazard function as follows: 


CPR (u,t) = Pr{u<7r<ut+l|7r>u} 
_ Sp(t,u)- Sp (t,u +1) 
7 Sp (t, u) 


= Iean (- T aac, as) 


If A, (t, s) is constant and equal to Àp, we can approximate the CPR by the hazard rate A, 
because we have CPR (u,t) + 1—e7*” ~ Ap. 

We use the prepayment monitoring report published by the Federal Housing Finance 
Agency (FHFA). From 2008 to 2018, the CPR for 30-year mortgages varies between 5% 
to 35% in the US. The lowest value is reached at the end of 2008. This shows clearly that 
prepayments depend on the economic cycle. During a crisis, the number of defaults increases 
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while the number of prepayments decreases. This implies that there is a negative correlation 
between default and prepayment rates. However, there is a high heterogeneity depending 
on the coupon rate and the issuance date as shown in Table 7.24. We generally observe 
that the CPR increases with the coupon rate. For example, in June 2018, the CPR is 7% 
greater for a 30-year mortgage issued between 2012 and 2016 with a 4.5% coupon than 
with a 3% coupon. We also verify the ramp effect because the prepayment rate is not of 
the same magnitude before and after January 2017, which corresponds to the 30-month age 
after which the prepayment rate can be assumed to be constant. This is why the CPR is 
only 5.3% and 12.8% for mortgages issued in 2018 and 2017 while it is equal to 17.4% for 
mortgages issued in 2016 when the coupon rate is 4.5%. 


TABLE 7.24: Conditional prepayment rates in June 2018 by coupon rate and issuance 
date 


Year 2012 2013 2014 2015 2016 2017 2018 
Coupon = 3% 9.6% 10.2% 10.9% 10.0% 87% 5.3% 3.1% 
Coupon = 4.5% 16.1% 15.8% 16.6% 17.9% 17.4% 12.8% 5.3% 
Difference 6.5% 5.6% 5.7% 80% 87% 76% 2.2% 


Source: RiskSpan dataset, FHFA (2018) and author’s calculations. 


7.3.3 Redemption risk 
7.3.3.1 The funding risk of term deposits 


A term deposit, also known as time deposit or certificate of deposit (CD), is a fixed-term 
cash investment. The client deposits a minimum sum of money into a banking account in 
exchange for a fixed rate over a specified period. A term deposit is then defined by three 
variables: the deposit or CD rate i(t), the maturity period m and the minimum balance 
D-. For example, the minimum deposit is generally $1000 in the US, and the typical 
maturities are 1M, 3M, 6M, 1Y, 2Y and 3Y. In some banks, the deposit rate may depends 
on the deposit amount’!. Term deposits are an important source of bank funding with 
demand deposits and savings accounts. However, they differ from non-maturity deposits 
because they have a fixed maturity, their rates are higher and they may be redeemed with 
a penalty. When buying a term deposit, the investor can withdraw their funds only after 
the term ends. This is why CD rates are generally greater than NMD rates, because term 
deposits are a most stable funding resource for banks. Moreover, CD rates are generally 
more sensitive to market interest rates than NMD rates, because a term deposit is more 
an investment product while a demand deposit is more a transaction account. Under some 
conditions, the investor may withdraw his term deposit before the maturity date if he pays 
early redemption costs and fees, which generally correspond to a reduction of the deposit 
rate. For example, i(t) may be reduced by 80% if the remaining maturity is greater than 
50% of the CD maturity and 30% if the remaining maturity is less than 20% of the CD 
maturity. 

According to Gilkeson et al. (1999), early time deposit withdrawals may be motivated 
by two reasons. As for prepayments, the first reason is economic. If market interest rates 
rise, the investor may have a financial incentive to close his old term deposit and reinvest his 


TlFor example, Chase defines six CD rates for a given maturity and considers the following bands: 
below $10K, $10K - $25K, $25K - $50K, $50K - $100K, $100K - $250K and $250+ (source: 
https: //www.chase.com/personal/savings/bank-cd). 
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money into a new term deposit. In this case, the investor is sensitive to the rate differential 
i(t) — ig where ig is the original CD rate and i(t) is the current CD rate. In this case, 
early withdrawal risk can be viewed as the opposite of prepayment risk. Indeed, while 
the economic reason of prepayment risk is a fall of interest rates, the economic reason of 
redemption risk is a rise of interest rates. Since both risks imply a negative impact on the 
net interest income, the impact on the liquidity risk is different: the bank receives cash in 
case of a prepayment, while the funding of the bank is reduced in case of redemption. The 
second reason is related to negative liquidity shocks of depositors. For example, the client 
may need to get his money back because of life events: job loss, divorce, revenue decline, 
etc. In this case, redemption risk is explained by idiosyncratic liquidity shocks that are 
independent and can be measured by a structural constant rate. But redemption risk can 
also be explained by systemic liquidity shocks. For example, economic crises increase the 
likelihood of early withdrawals. In this case, we cannot assume that the redemption rate is 
constant because it depends on the economic cycle. 


7.3.3.2 Modeling the early withdrawal risk 


Redemption risk can be measured using the same approach we have used for prepayment 
risk. This is particularly true for the economic component and the idiosyncratic liquidity 
component. The systemic component of negative liquidity shocks requires a more appro- 
priate analysis and makes the modeling more challenging. Another difficulty with the early 
withdrawal risk is the scarcity of academic models, professional publications and data. To 
our knowledge, there are only five academic publications on this topic and only three articles 
that give empirical results’?: Cline and Brooks (2004), Gilkeson et al. (1999) and Gilkeson 
et al. (2000). 


The redemption-based survival function of time deposits can be decomposed as: 
S, (t, u) == Seconomic (t, u) Shiquidity (t, u) 
where Seconomic (t, u) is the amortization function related to reinvestment financial incentives 
and Sliquidity (t, u) is the amortization function due to negative liquidity shocks. 


Let us first focus on economic withdrawals. We note t the current date, m the maturity 
of the time deposit and Nop the initial investment at time 0. In the absence of redemption, 
the value of the time deposit at the maturity is equal to Vo = No (1+ io)". If we assume 
that 7 is the withdrawal time, the value of the investment for 7 = t becomes: 


V, (t) = No: (1+ (1-9 (t) to)" (1 +i (6) — C (t) 


where y(t) is the penalty parameter applied to interest paid and C(t) is the break fee. 

For example, if we specify y(t) = 1 — t/m, ọ (t) is a linear decreasing function between”? 

o (0) = 100% and y (m) = 0%. C (t) may be a flat fee (e.g. C (t) = $1000) or C (t) may be 

a proportional fee: C (t) = c(t) - No. The rational investor redeems the term deposit if the 

refinancing incentive is positive: 

v-v, 
No 


In the case where C (t) = c(t) No, we obtain the following equivalent condition: 


š 1 =t 
(1 + io)” + c(t) ) ee 
. vt ~~ 
(1+ (1 — y (t)) to) 

“2The two other theoretical publications are Stanhouse and Stock (2004), and Gao et al. (2018). 


73.5 (t) = 100% if the redemption occurs at the beginning of the contract and y(m) = 0% when the term 
deposit matures. 


0 


iW >PO=( 
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An example of this refinancing incentive rule is given in Figure 7.24. This corresponds to 
a three-year term deposit whose rate is equal to 2%. The penalty applied to interest paid 
is given by y(t) = 1 — t/m. We show the impact of the fee c(t) on i* (t). We observe that 
the investor has no interest to wait if the interest rate rise is sufficient. Therefore, there 
is an arbitrage between the current rate i(t) and the original rate ig. We deduce that the 
hazard function takes the following form: Agconomic (t, u) = g (i (u) — io) or Aeconomic (t, u) = 
g(r (u) — io) where g is a function to estimate. For instance, Gilkeson et al. (1999) consider a 
logistic regression model and explain withdrawal rates by the refinancing incentive variable. 
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FIGURE 7.24: Refinancing incentive rule of term deposits 
For early withdrawals due to negative liquidity shocks, we can decompose the hazard 
function into two effects: 
Mliquidity (t, u) = Astructural + Acyclical (u) 


where Astructural is the structural rate of redemption and Acyclical (u) is the liquidity compo- 
nent due to the economic cycle. A simple way to model Acyctical (u) is to consider a linear 
function of the GDP growth. 


7.4 Exercises 
7.4.1 Constant amortization of a loan 


We consider a loan that is repaid by annual payments. We assume that the notional of 
the loan is equal to No, the maturity of the loan is n and 7 is the annual interest rate. We 
note N (t) the outstanding amount, J (t) the interest payment, P (t) the principal payment 
at time t and C (t) the present value. 
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1. Let Co be the present value of an annuity A that is paid annually during n years. 
Calculate Co as a function of A, n and i. 


2. Determine the constant annuity A of the loan and the corresponding annuity rate 
a(n): 


3. Calculate I (1) and P (1). Show that the outstanding amount N (1) is equal to the 
present value C (1) of the constant annuity A for the last n — 1 years. 


4. Calculate the general formula of N (t), I (t) and P (t). 


7.4.2 Computation of the amortization functions S (t, u) and S* (t, u) 


In what follows, we consider a debt instrument, whose remaining maturity is equal to 
m. We note t the current date and T = t + m the maturity date. 


1. We consider a bullet repayment debt. Define its amortization function S (t, u). Calcu- 
late the survival function S* (t, u) of the stock. Show that: 


S*(t,u) =1{t<u<t+m}- (-"=*) 


m 
in the case where the new production is constant. Comment on this result. 
2. Same question if we consider a debt instrument, whose amortization rate is constant. 


3. Same question if we assume” that the amortization function is exponential with 
parameter A. 


4. Find the expression of D* (t) when the new production is constant. 
5. Calculate the durations D (t) and D* (t) for the three previous cases. 


6. Calculate the corresponding dynamics dN (t). 


7.4.3 Continuous-time analysis of the constant amortization mortgage 
(CAM) 


We consider a constant amortization mortgage, whose maturity is equal to m. We note 
i the interest rate and A the constant annuity. 


1. Let No be the amount of the mortgage at time t = 0. Write the equation of dN (t). 
Show that the annuity is equal to: 


fa 
1— e`im 
Deduce that the outstanding balance at time t is given by: 


1— e`il(m-—t) 


1-— e7im 


N (t) =1{t<m}-No- 


2. Find the expression of S (t, u) and S* (t, u). 


3. Calculate the liquidity duration D (t). 


74By definition of the exponential amortization, we have m = +00. 
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7.4.4 Valuation of non-maturity deposits 


This exercise is based on the model of De Jong and Wielhouwer (2003), which is an 
application of the continuous-time framework of Jarrow and van Deventer (1998). The 
framework below has been used by de Jong and Wielhouwer to model variable rate savings 
accounts. However, it is valid for all types of non-maturity deposits (demand deposits and 
savings accounts). For instance, Jarrow and van Deventer originally develop the approach 
for all types of demand deposits”. 


1. 


Let D (t) be the amount of savings accounts. We note r (t) and i(t) the market rate 
and the interest rate paid to account holders. We define the current market value of 
liabilities as follows: 


Ly =E | | * rte (i (t) D (t) — 0,D (t)) a 


Explain the expression of Lo, in particular the two components i (t) D (t) and 0;,D (t). 


By considering that the short rate r (t) is constant, demonstrate that: 


Lo = Do +E / eT (ilt) — r(t) D (t) a 
0 
Calculate the current mark-to-market Vo of savings accounts. How do you interpret 
Vo? 


Let us assume that the margin m (t) = r (t) — i (t) is constant and equal to mo, and 
D (t) is at the steady state Dao. Show that: 


where ræ is a parameter to determine. 


For the specification of the deposit rate i(t) and the deposit balance D (t), De Jong 
and Wielhouwer (2003) propose the following dynamics: 


di (t) = (a+ 8 (r (t) — i (t))) dt 


and: 

dD (t) = y (Dæ — D (t)) dt — ô (r (t) — i (t)) dt 
where a, 6 > 0, y > 0 and ô > 0 are four parameters. What is the rationale of these 
equations? Find the general expression of i(t) and D (t). 


In the sequel, the market rate r (t) is assumed to be constant and equal to ro. Deduce 
the value of i (t) and D (t). 


Calculate the net asset value Vo and deduce its sensitivity with respect to the market 
rate ro when a = 0. 


Find the general expression of the sensitivity of Vo with respect to the market rate ro 
when a Æ 0. Deduce the duration Dp of the deposits. 


“5 Janosi et al. (1999) provide an empirical analysis of the Jarrow-van Deventer model for negotiable 
orders of withdrawal accounts (NOW), passbook accounts, statement accounts and demand deposit accounts 
(DDAs), whereas Kalkbrener and Willing (2004) consider an application to savings accounts. Generally, 
these different accounts differ with respect to the specification of interest paid i (t) and the dynamics of the 
deposit amount D (t). 
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9. We consider a numerical application of the De Jong-Wielhouwer model with the fol- 


lowing parameters: rọ = 10%, io = 5%, Do = 100, D = 150, 8 = 0.5, y = 0.7 
and 6 = 0.5. Make a graph to represent the relationship between the time t and the 
deposit rate i(t) when a is equal to —1%, 0 and 1%. Why is it natural to consider 
that a < 0? We now assume that a = —1%. Draw the dynamics of D (t). What are 
the most important parameters that impact D (t)? What is the issue if we calculate 
the duration of the deposits with respect to rg when a is equal to zero? Make a graph 
to represent the relationship between the market rate rg and the duration when a is 
equal to —50 bps, —1% and —2%. 


7.4.5 Impact of prepayment on the amortization scheme of the CAM 


This is a continuation of Exercise 7.4.3 on page 450. We recall that the outstanding 


balance at time t is given by: 


iz e—i(m—t) 


N (t) = 1{t <m}- No: —— 
=e 


. Find the dynamics dN (t). 


. We note Ñ (t) the modified outstanding balance that takes into account the prepay- 


ment risk. Let Àp (t) be the prepayment rate at time t. Write the dynamics of N (t). 


. Show that Ñ (t) = N (t)-S, (t) where S, (t) is the prepayment-based survival function. 


. Calculate the liquidity duration D(t) associated to the outstanding balance Ñ (t) 


when the hazard rate of prepayments is constant and equal to Ap. 


Chapter 8 


Systemic Risk and Shadow Banking System 


The financial crisis of 2008 is above all a crisis of the financial system as a whole. This is 
why it is called the Global Financial Crisis (GFC) and is different than the previous crises 
(the Great Depression in the 1930s, the Japan crisis in the early 1990s, the Black Monday of 
1987, the 1997 Asian financial crisis, etc.). It is a superposition of the 2007 subprime crisis, 
affecting primarily the mortgage and credit derivative markets, and a liquidity funding 
crisis following the demise of Lehman Brothers, which affected the credit market and more 
broadly the shadow banking system. This crisis was not limited to the banking system, but 
has affected the different actors of the financial sector, in particular insurance companies, 
asset managers and of course investors. As we have seen in the previous chapters, this led to 
a strengthening of financial regulation, and not only on the banking sector. The purpose of 
new regulations in banks, insurance, asset management, pension funds and organization of 
the financial market is primarily to improve the rules of each sector, but also to reduce the 
overall systemic risk of the financial sector. In this context, systemic risk is now certainly 
the biggest concern of financial regulators and the Financial Stability Board (FSB) was 
created in April 2009 especially to monitor the stability of the global financial system and 
to manage the systemic risk!. It rapidly became clear that the identification of the systemic 
risk is a hard task and can only be conducted in a gradual manner. This is why some 
policy responses are not yet finalized, in particular with the emergence of a shadow banking 
system, whose borders are not well defined. 


8.1 Defining systemic risk 
The Financial Stability Board defines systemic events in broad terms: 


“Systemic event is the disruption to the flow of financial services that is (i) 
caused by an impairment of all or parts of the financial system and (ii) has 
the potential to have serious negative consequences on the real economy” (FSB, 
2009, page 6). 


‘The FSB is the successor to the Financial Stability Forum (FSF), which was founded in 1999 by the 
G7 Finance Ministers and Central Bank Governors. With an expanded membership to the G20 countries, 
the mandate of the FSB has been reinforced with the creation of three Standing Committees: 

e the Standing Committee on Assessment of Vulnerabilities (SCAV), which is the FSB’s main mecha- 
nism for identifying and assessing risks; 
e the Standing Committee on Supervisory and Regulatory Cooperation (SRC), which is charged with 
undertaking further supervisory analysis and framing a regulatory or supervisory policy response to 
a material vulnerability identified by SCAV; 
e the Standing Committee on Standards Implementation (SCSI), which is responsible for monitoring 
the implementation of agreed FSB policy initiatives and international standards. 
Like the Basel Committee on Banking Supervision, the secretariat to the Financial Stability Board is hosted 
by the Bank for International Settlements and located in Basel. 
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This definition focuses on three important points. Firstly, systemic events are associated 
with negative externalities and moral hazard risk, meaning that every financial institution’s 
incentive is to manage its own risk/return trade-off but not necessarily the implications of 
its risk on the global financial system. Secondly, a systemic event can cause the impairment 
of the financial system. Lastly, it implies significant spillovers to the real economy and 
negative effects on economic welfare. 


It is clear that the previous definition may appear too large, but also too restrictive. It 
may be too large, because it is not precise and many events can be classified as systemic 
events. It is also too restrictive, because it is difficult to identify the event that lies at the 
origin of the systemic risk. Most of the times, it is caused by the combination of several 
events. As noted by Zigrand (2014), systemic risk often refers to exogenous shocks, whereas 
it can also be generated by endogenous shocks: 


“Systemic risk comprises the risk to the proper functioning of the system as well 
as the risk created by the system” (Zigrand, 2014, page 3). 


In fact, there are numerous definitions of systemic risk because it is a multifaceted concept. 


8.1.1 Systemic risk, systematic risk and idiosyncratic risk 


In financial theory, systemic and idiosyncratic risks are generally opposed. Systemic 
risk refers to the system whereas idiosyncratic risk refers to an entity of the system. For 
instance, the banking system may collapse, because many banks may be affected by a severe 
common risk factor and may default at the same time. In economics, we generally make the 
assumption that idiosyncratic and common risk factors are independent. However, there 
exist some situations where idiosyncratic risk may affect the system itself. It is the case 
of large institutions, for example the default of big banks. In this situation, systemic risk 
refers to the propagation of a single bank distressed risk to the other banks. 


Let us consider one of the most famous models in finance, which is the capital asset 
pricing model (CAPM) developed by William Sharpe in 1964. Under some assumptions, he 
showed that the expected return of asset į is related to the expected return of the market 
portfolio in the following way: 


i [Riz] — r = bi- (E [Rmt] — r) (8.1) 


where Ri + and Rm, are the asset and market returns, r is the risk-free rate and the coeffi- 
cient ĝ; is the beta of the asset 7 with respect to the market portfolio: 


B= cov (Riz, Rm.t) 

i o? (Rmt) 
Contrary to idiosyncratic risks, systematic risk Rm, cannot be diversified, and investors 
are compensated for taking this risk. This means that the market risk premium is positive 


L [Rm t] — r > 0) whereas the expected return of idiosyncratic risk is equal to zero. By 
definition, the idiosyncratic risk of asset 7 is equal to: 


Eit = (Rit =r)— bi: ( 2 [Rmt] =f) 


—— 


with E [e; 4] = 0. As explained above, this idiosyncratic risk is not rewarded because it can 
be hedged (or diversified). In this framework, we obtain the one-factor model given by the 
following equation: 


Rit = ai + Bi: Rmt t+ Ei (8.2) 
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where a; = (1 — pi) rand e442 = éit — bi: (Rmt — E [Rm,t]) is a white noise process”. Because 
Ei+ is a new parametrization of the idiosyncratic risk, it is easy to show that this specific 
factor is independent from the common factor Rm and the other specific factors €j +. If 
we assume that asset returns are normally distributed, we have Rm ~ N (E[Rmt] 07,1) 
and: 


Elt 
~N (0, diag (5, ...,č2)) 


En,t 


In the capital asset pricing model, it is obvious that the risk of the system (R,,..., Rn) 
is due to the common risk factor also called the systematic risk factor. Indeed, a stress S 
can only be transmitted to the system by a shock on Rm: 


S(Rm) => S(R1,..., Rn) 


This is the traditional form of systemic risk. In the CAPM, idiosyncratic risks are not a 
source of systemic risk: 


S (€;) = S(Ri,..., Rn) 


because the specific risk e; only affects one component of the system, and not all the 
components. 


In practice, systemic risk can also occur because of an idiosyncratic shock. In this case, 
we distinguish two different transmission channels: 


1. The first channel is the impact of a specific stress on the systematic risk factor: 
This transmission channel implies that the assumption £; L Rm is not valid. 


2. The second channel is the impact of a specific stress on the other specific risk factors: 
S(e;) => S(e1,.--,€n) => S(R1,..., Rn) 
This transmission channel implies that the assumption €; L £; is not valid. 


Traditional financial models (CAPM, APT) fail to capture these two channels, because they 
neglect some characteristics of systemic factors: the feedback dynamic of specific risks, the 
possibility of multiple equilibria and the network density. 


The distinction between systematic and idiosyncratic shocks is done by De Bandt and 
Hartmann (2000). However, as noted by Hansen (2012), systematic risks are aggregate risks 
that cannot be avoided. A clear example is the equity risk premium. In this case, systematic 
risks are normal and inherent to financial markets and there is no reason to think that we 
can prevent them. In the systemic risk literature, common or systematic risks reflect another 
reality. They are abnormal and are viewed as a consequence of simultaneous adverse shocks 
that affect a large number of system components (De Bandt and Hartmann, 2000). In 
this case, the goal of supervisory policy is to prevent them, or at least to mitigate them. 
In practice, it is however difficult to make the distinction between these two concepts of 
systematic risk. In what follows, we will use the term systematic market risk for normal 
shocks, even if they are severe and we now reserve the term systematic risk for abnormal 
shocks. 


eit is a new form of the idiosyncratic risk. 
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8.1.2 Sources of systemic risk 


De Bandt and Hartmann (2000) explained that shocks and propagation mechanisms are 
the two main elements to characterize systemic risk. If we consider our previous analysis, 
the shock corresponds to the initial stress S whereas the propagation mechanism indicates 
the transmission channel => of this initial shock. It is then useful to classify the several 
sources of systemic risk depending on the nature of the (systematic) shock or the type of 
propagation®. 


8.1.2.1 Systematic shocks 


Benoit et al. (2017) list four main systematic shocks: asset-price bubble risk, correlation 
risk, leverage risk and tail risk. In what follows, we give their characteristics and some exam- 
ples. However, even if these risks recover different concepts, they are also highly connected 
and the boundaries between them are blurred. 


Asset-price (or speculative) bubble corresponds to a situation where prices of an as- 
set class rise so sharply that they strongly deviate from their fundamental values*. The 
formation of asset bubbles implies that many financial institutions (banks, insurers, asset 
managers and asset owners) are exposed to the asset class, because they are momentum 
investors. They prefer to ride the bubble and take advantage of the situation, because being 
a contrarian investor is a risky strategy”. In this context, the probability of crash occurring 
increases with investors’ belief that “they can sell the asset at an even higher price in the 
future” (Brunnermeier and Oehmke, 2013). Examples of speculative bubbles are Japanese 
asset bubble in the 1980s, the dot.com bubble between 1997 and 2000 and the United States 
housing bubble before 2007. 


Correlation risk means that financial institutions may invest in the same assets at the 
same time. They are several reasons to this phenomenon. Herd behavior is an important 
phenomenon in finance (Grinblatt et al., 1995; Wermers, 1999; Acharya and Yorulmazer, 
2008) . It corresponds to the tendency for mimicking the actions of others. According to 
Devenow and Welch (1996), “such herding typically arises either from direct payoff external- 
ities (negative externalities in bank runs; positive externalities in the generation of trading 
liquidity or in information acquisition), principal-agent problems (based on managerial de- 
sire to protect or signal reputation), or informational learning (cascades)”. Another reason 
that explains correlated investments is the regulation, which may have a high impact on the 
investment behavior of financial institutions. Examples include the liquidity coverage ratio, 
national regulations of pension funds, Solvency II, etc. Finally, a third reason is the search 
of diversification or yield. Indeed, we generally notice a strong enthusiasm at the same time 
for an asset class which is is considered as an investment that helps to diversify portfolios 
or improve their return. 


In periods of expansion, we observe an increase of leverage risk, because financial institu- 
tions want to benefit from the good times of the business cycle. As the expansion proceeds, 
investors becomes then more optimistic and the appetite for risky investments and leverage 
develops®. However, a high leverage is an issue in a stressed period, because of the drop 
of asset prices. Theoretically, the stressed loss S cannot be greater than the inverse of the 


3Concerning idiosyncratic risks, they are several sources of stress, but they can all be summarized by 
the default of one system’s component. 

+A bubble can be measured by the price-to-earnings (or P/E) ratio, which is equal to the current share 
price divided by the earnings per share. For instance, stocks of the technology sector had an average price- 
to-earnings ratio larger than 100 in March 2000. 

5Tt is extremely difficult for a financial institution to miss the trend from a short-term business perspective 
and to see the other financial institutions be successful. 

6This is known as the Minsky’s financial instability hypothesis. 
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financial institution’s leverage ratio CR in order to maintain its safety: 


1 


ees 
SS ER 


For instance, in the case where LR is equal to 5, the financial institution defaults if the loss is 
greater than 20%. In practice, the stress tolerance depends also on the liquidity constraints. 
It is then easier to leverage a portfolio in a period of expansion than to deleverage it in a 
period of crisis, where we generally face liquidity problems. Geanakoplos (2010) explained 
the downward spiral created by leverage by the amplification mechanism due to the demand 
of collateral assets”. Indeed, decline in asset prices results in asset sales of leveraged investors 
because of margin call requirements and asset sales results in decline in asset prices. Leverage 
induces then non-linear and threshold effects that can create systemic risk. The failure of 
LTCM is a good illustration of leverage risk (Jorion, 2000). 
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FIGURE 8.1: Illustration of tail risk 


The concept of tail risk suggests that the decline in one asset class is abnormal with 
respect to the normal risk. This means that the probability to observe a tail event is very 
small. Generally, the normal risk is measured by the volatility. For instance, an order of 
magnitude is 20% for the long-term volatility of the equity asset class. The probability to 
observe an annual drop in equities larger than 40% is equal to 2.3%. An equity market crash 
can therefore not be assimilated to a tail event. By contrast, an asset class whose volatility is 
equal to 2.5% will experience a tail risk if the prices are 20% lower than before. In this case, 
the decrease represents eight times the annual volatility. In Figure 8.1, we have reported 
these two examples of normal and abnormal risks. When the ratio between the drawdown® 
and the volatility is high (e.g. larger than 4), this generally indicates the occurrence of a 
tail risk. The issue with tail risks is that they are rarely observed and financial institutions 


Tsee Section 4.3 on page 293. 
8It is equal to the maximum loss expressed in percent. 
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tend to underestimate them. Acharya et al. (2010) even suggested that tail risk investments 
are sought by financial institutions. Such examples are carry or short volatility strategies. 
For instance, investing in relatively high credit-quality bonds is typically a tail risk strategy. 
The rationale is to carry the default risk, to capture the spread and to hope that the default 
will never happen. However, the credit crisis in 2007-2008 showed that very low probability 
events may occur in financial markets. 


The distinction between the four systematic risks is rather artificial and theoretical. In 
practice, they are highly related. For instance, leverage risk is connected to tail risk. Thus, 
the carry strategy is generally implemented using leverage. Tail risk is related to bubble risk, 
which can be partially explained by the correlation risk. In fact, it is extremely difficult to 
identify a single cause, which defines the zero point of the systemic crisis. Sources of systemic 
risk are correlated, even between an idiosyncratic event and systematic risks. 


8.1.2.2 Propagation mechanisms 


As noted by De Bandt and Hartmann (2000), transmission channels of systemic risk 
are certainly the main element to understand how a systemic crisis happen in an econ- 
omy. Indeed, propagation mechanisms are more important than the initial (systematic or 
idiosyncratic) shock, because most of shocks do not produce systemic crisis if they are not 
spread to the real economy. Among the diversity of propagation mechanisms, academics 
and regulators have identified three major transmission channels: networks effects, liquidity 
channel and critical function failure. 


Network effects stem from the interconnectedness of financial institutions and can be 
seen as the system-wide counterpart of an institution’s counterparty risk. Network effect is 
a general term describing the transmission of a systemic shock from one particular entity 
and market to several entities or markets. In the case of LTCM, systemic risk stemmed 
from the interconnection between LTCM and the banking system combined with the high 
leverage strategy pursued by the hedge fund. This created an over sized exposure for the 
banking system to counterparty credit risk from one single entity. Hence, LTCM’s idiosyn- 
cratic risk was transferred to the entire financial system and became a source of systemic 
risk. The early and influential work of Allen and Gale (2000) showed that this source of 
financial contagion is highly contingent on the network’s structure and the size of the shock. 
Their model also suggests that a fully connected network might be more resilient than an 
incomplete network, contradicting the idea that systemic risk increases with average inter- 
connectedness. However, interconnectedness of an individual entity is central to the notion 
of “being systemically important”. In the banking industry, balance sheet contagion is an 
important source of systemic risk and is linked to the counterparty credit risk. The com- 
plexity of the banking network can create domino effects and feedback loops, because the 
failure of one bank is a signal on the health of the other banks. This informational con- 
tagion is crucial to understand the freeze of the interbank market during the 2008 GFC. 
Informational contagion is also an important factor of bank runs (Diamond and Dybvig, 
1983) . However, network effects are not limited to the banking system. Thus, the subprime 
crisis showed that they concern the different actors of financial system. It was the case with 
insurance companies’ and asset managers. In this last case, money market funds (MMF) 
were notably impacted, forcing some unprecedented measures as the temporary guarantee 
of money market funds against losses by the US Treasury: 


“Following the bankruptcy of Lehman Brothers in 2008, a well-known fund — the 
Reserve Primary Fund — suffered a run due to its holdings of Lehman’s commer- 
cial paper. This run quickly spread to other funds, triggering investors’ redemp- 


°The most famous example is the AIG’s bailout by the US government in late 2008. 
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tions of more than USD 300 billion within a few days of Lehman’s bankruptcy. 
Its consequences appeared so dire to financial stability that the U.S. government 
decided to intervene by providing unlimited deposit insurance to all money mar- 
ket fund deposits. The intervention was successful in stopping the run but it 
transferred the entire risk of the USD 3 trillion money market fund industry to 
the government” (Kacperczyk and Schnabl, 2013). 


Liquidity is another important propagation mechanism of systemic risk. For instance, 
the global financial crisis can be seen as the superposition of the subprime crisis, affecting 
primarily the mortgage and credit derivative markets and by extension the global banking 
system, and a liquidity funding crisis following the demise of Lehman Brothers, which 
affected interbank markets and more broadly the shadow banking system. In this particular 
case, the liquidity channel caused more stress than the initial systematic event of subprime 
credit. As shown previously, the concept of liquidity is multi-faceted and recovers various 
dimensions!” that are highly connected. In this context, liquidity dry-up events are difficult 
to predict or anticipate, because they can happen suddenly. This is particularly true for 
the market liquidity with the recent flash crash/rally events''. Brunnermeier and Pedersen 
(2009) demonstrated that a demand shock can create a flight-to-quality environment in 
which liquidity and loss spirals can arise simply due to funding requirements on speculators 
such as margin calls and repo haircuts. In some instances, a liquidity dry-up event resulting 
from a flight-to-quality environment can result in runs, fire sales, and asset liquidations in 
general transforming the market into a contagion mechanism. This is particularly true if 
the market size of the early players affected by the shock is large enough to induce a large 
increase in price pressure. The likelihood and stringency of these spirals is exacerbated by 
high leverage ratios. 

Besides network effects and liquidity-based amplification mechanisms, the third identi- 
fied transmission channel for systemic risk relates to the specific function a financial institu- 
tion may come to play in a specific market, either because of its size relative to the market 
or because of its ownership of a specific skill which makes its services essential to the func- 
tioning of that market. De Bandt and Hartmann (2000) identified payment and settlement 
systems as the main critical function that can generate systemic risk. The development of 
central counterparties, which is promoted by the recent financial regulation, is a response to 
mitigate network and counterparty credit risks, but also to strengthen the critical function 
of clearing systems. Other examples of critical services concern the entire investment chain 
from the asset manager to the asset owner, for instance securities lending intermediation 
chains or custody services. 


8.1.3 Supervisory policy responses 


The strength of the Global Financial Crisis led to massive government interventions 
around the world to prop up failing financial institutions, seen as ‘too big too fail’. Public 
concern about the negative externalities of such interventions called pressingly for structural 
reforms to prevent whenever possible future similar events. The crisis further brought to 
light, among other key factors, the failure of regulation to keep up with the complexity of 
the activities of global financial institutions. In particular, calls for prudential reforms were 
made around the world to create mechanisms to monitor, prevent and resolve the liquidation 


10We recall that the main dimensions are market/funding liquidity, idiosyncratic/systematic liquidity, 
domestic/global liquidity and inside/outside liquidity (see Chapter 6 on page 347). 

11 Examples are the flash crash of 6 May 2010 (US stock markets), the flash rally of 15 October 2014 (US 
Treasury bonds), the Swiss Franc move of 15 January 2015 (removal of CHF pleg to EUR) and the market 
dislocation of 24 August 2015 (stock markets and US ETFs). 
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of financial institutions without the need for government intervention. Consequently, a vast 
program of financial and institutional reforms was undertaken around the world. 


8.1.3.1 A new financial regulatory structure 


As explained in the introduction of this chapter, the Financial Stability Board is an 
international oversight institution created in April 2009 to monitor the stability of the 
global financial system, and not only the activities of banking and insurance industries!?. 
Indeed, the 2008 GFC also highlighted the increasing reliance of large institutions on the 
shadow banking system. This refers to the broad range of short-term financing products and 
activities performed by non-bank actors in the financial markets and therefore historically 
not subject to the same regulatory supervision as banking activities. This explained that 
the FSB has also the mandate to oversee the systemic risk induced by shadow banking 
entities'’. Besides the analysis of the financial system, the main task of the FSB is the 
identification of systemically important financial institutions (SIFI). FSB (2010) defines 
them as institutions whose “distress or disorderly failure, because of their size, complexity 
and systemic interconnectedness, would cause significant disruption to the wider financial 
system and economic activity”. It distinguishes between three types of SIFIs: 


1. G-SIBs correspond to global systemically important banks; 
2. G-SIIs designate global systemically important insurers; 


3. the third category is defined with respect to the two previous ones; it incorporates 
other SIFIs than banks and insurers (non-bank non-insurer global systemically im- 
portant financial institutions or NBNI G-SIFIs). 


Every year since 2013, the FSB publishes the list of G-SIFIs. In Tables 8.1 and 8.2, we 
report the 2018 list of G-SIBs and 2016 list of G-SIIs!*. At this time, NBNI G-SIFIs are 
not identified, because the assessment methodology is not achieved’. 


TABLE 8.1: List of global systemically important banks (November 2018) 


Agricultural Bank of China Bank of America Bank of China 

Bank of New York Mellon Barclays BNP Paribas 

China Construction Bank Citigroup Credit Suisse 
Deutsche Bank Goldman Sachs Crédit Agricole 

BPCE HSBC ICBC 

ING Bank JPMorgan Chase Mitsubishi UFJ FG 
Mizuho FG Morgan Stanley Royal Bank of Canada 
Santander Société Générale Standard Chartered 
State Street Sumitomo Mitsui FG UBS 

UniCredit Wells Fargo 


Source: FSB (2018b), 2018 List of Global Systemically Important Banks. 


12¥For these two financial sectors, the FSB collaborates with the Basel Committee on Banking Supervision 
and the International Association of Insurance Supervisors (IAIS). 

13In this last case, the FSB relies on the works of the International Organization of Securities Commissions 
(IOSCO). 

14The list has not been updated since 2016 because FSB and IAIS are in discussion for considering a new 
framework for the assessment and mitigation of systemic risk in the insurance sector. 

15See the discussion on page 466. 
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TABLE 8.2: List of global systemically important insurers (November 2016) 


Aegon Allianz AIG 
Aviva AXA MetLife 
Ping An Group Prudential Financial Prudential plc 


Source: FSB (2016), 2016 List of Global Systemically Important Insurers. 


Systemic risk is also monitored at the regional level with the European Systemic Risk 
Board (ESRB) for the European Union and the Financial Stability Oversight Council 
(FSOC) for the United States. The ESRB was established on 16 December 2010 and is 
part of the European System of Financial Supervision (ESFS), the purpose of which is to 
ensure supervision of the EU financial system!®. As established under the Dodd-Frank re- 
form in July 2010, the FSOC is composed of the Secretary of the Treasury, the Chairman 
of the Federal Reserve and members of US supervision bodies (CFTC, FDIC, OCC, SEC, 
etc.). 


The global financial crisis had also an impact on the banking supervision structure, in 
particular in the US and Europe. Since 2010, this is the Federal Reserve Board which is in 
charge to directly supervise large banks and any firm designated as systemically important 
by the FSOC (Murphy, 2015). The other banks continue to be supervised by the Federal 
Deposit Insurance Corporation (FDIC) and the Office of the Comptroller of the Currency 
(OCC). In Europe, each bank was supervised by its national regulators until the establish- 
ment of the Single Supervisory Mechanism (SSM). Starting from 4 November 2014, large 
European banks are directly supervised by the European Central Bank (ECB), while na- 
tional supervisors are in a supporting role. This concerns about 120 significant banks and 
represent 80% of banking assets in the euro area. For each bank supervised by the ECB, 
a joint supervisory team (JST) is designated. Its main task is to perform the Supervisory 
Review and Evaluation Process (SREP), propose the supervisory examination programme, 
implement the approved supervisory decisions and ensure coordination with the on-site in- 
spection teams and liaise with the national supervisors. Public awareness of the systemic 
risk has also led some countries to reform national supervision structures. For instance in 
the United Kingdom, the Financial Services Authority (FSA) is replaced in April 2013 by 
three new supervisory bodies: the Financial Policy Committee (FPC), which is responsible 
for macro-prudential regulation, the Prudential Regulation Authority (PRA), which is re- 
sponsible for micro-prudential regulation of financial institutions and the Financial Conduct 
Authority (FCA), which is responsible for markets regulation. 


Remark 88 The 2008 Global Financial Crisis has also impacted other financial sectors 
than the banking sector, but not to the same degree. Nevertheless, the powers of existing 
authorities have been expanded in asset management and markets regulation (ESMA, SEC, 
CFTC). In 2010, the European Insurance and Occupational Pensions Authority (EIOPA) 
was established in order to ensure a general supervision at the level of the European Union. 


8.1.3.2 A myriad of new standards 


Reforms of the financial regulatory framework were also attempted around the world in 
order to protect the consumers. Thus, the Dodd-Frank Wall Street Reform and Consumer 


16Besides the ESRB, the ESFS comprises the European Banking Authority (EBA), the European In- 
surance and Occupational Pensions Authority (EIOPA), the European Securities and Markets Authority 
(ESMA) and the Joint Committee of the European Supervisory Authorities. 
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Protection Act was signed into law in the US in July 2010. It is the largest financial regula- 
tion overhaul since 1930. Besides the reform of the US financial regulatory structure, it also 
concerns investment advisers, hedge funds, insurance, central counterparties, credit rating 
agencies, derivatives, consumer financial protection, mortgages, etc. One of the most famous 
propositions is the Volcker rule, which prohibits a bank from engaging in proprietary trading 
and from owning hedge funds and private equity funds. Another controversial proposition 
is the Lincoln amendment (or swaps push-out rule), which would prohibit federal assistance 
to swaps entities. 


In Europe, directives on the regulation of markets in financial instruments (MiFID 1 and 
2) from 2007 to 2014 as well as regulations on packaged retail and insurance-based invest- 
ment products (PRIIPS) with the introduction of the key information document (KID) in 
2014 came to reinforce the regulation and transparency of financial markets and the protec- 
tion of investors. European Market Infrastructure Regulation (EMIR) is another important 
European Union regulation, whose aim is to increase the stability of the OTC derivative 
markets. It introduces reporting obligation for OTC derivatives (trade repositories), clearing 
obligation for eligible OTC derivatives, independent valuation of OTC derivatives, common 
rules for central counterparties and post-trading supervisory. 


However, the most important reforms concern the banking sector. Many standards of 
the Basel III Accord are directly related to systemic risk. Capital requirements have been 
increased to strengthen the safety of banks. The leverage ratio introduces constraints to 
limit the leverage of banks. The aim of liquidity ratios (LCR and NSFR) is to reduce the 
liquidity mismatch of banks. Stress testing programs have been highly developed. Another 
important measure is the designation of systemically important banks!”, which are subject 
to a capital surcharge ranging from 1% to 3.5%. All these micro-prudential approaches tend 
to mitigate idiosyncratic factors. However, common factors are also present in the Basel 
II Accord. Indeed, the Basel Committee has introduced a countercyclical capital buffer 
in order to increase the capital of banks during excessive credit growth and to limit the 
impact of common factors on the systemic risk. Another important change is the careful 
consideration of counterparty credit risk. This includes of course the 1.25 factor to calculate 
the default correlation p (PD) in the IRB approach!®, but also the CVA capital charge. The 
promotion of CCPs since 2010 is also another example to limit network effects and reduce 
the direct interconnectedness between banks. Last but not least, the stressed ES of the 
Basel III Accord had a strong impact on the capital requirements for market risk. 


Remark 89 Another important reform concerns resolution plans, which describe the 
banks’s strategy for rapid resolution if its financial situation were to deteriorate or if it 
were to default. In Europe, the Bank Recovery and Resolution Directive (BRRD) applies 
in all banks and large investment firms since January 2015. In the United States, the or- 
derly liquidation authority (OLA) of the Dodd-Frank Act provides a theoretical framework 
for bank resolution'®. In Japan, a new resolution regime became effective in March 2014 
and ensures that a defaulted bank will be resolved via a bridge bank, where certain assets 
and liabilities are transferred. More recently, the FSB achieves TLAC standard for global 
systemically important banks. All these initiatives seek to build a framework to resolve a 
bank failure without public intervention. 


17It concerns both global (G-SIB) and domestic (D-SIB) systemically important banks. 

18See Footnote 70 on page 184. 

19Bank resolution plans can be found at the following web page: www.federalreserve.gov/bankinforeg 
/resolution-plans.htm. 
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8.2 Systemic risk measurement 


They are generally two ways of identify SIFIs. The first one is proposed by supervisors 
and considers firm-specific information that are linked to the systemic risk, such as the size 
or the leverage. The second approach has been extensively used by academics and considers 
market information to measure the impact of the firm-specific default on the entire system. 


8.2.1 The supervisory approach 


In what follows, we distinguish between the three categories defined by the FSB: banks, 
insurers and non-bank non-insurer financial institutions. 


8.2.1.1 The G-SIB assessment methodology 


In order to measure the systemic risk of a bank, BCBS (2014g) considers 12 indicators 
across five large categories. For each indicator, the score of the bank (expressed in basis 
points) is equal to the bank’s indicator value divided by the corresponding sample total??: 


Bank Indicator 4 


Indicator S = 
B Sample Total 


The indicator scores are then averaged to define the category scores and the final score. 
The scoring system is summarized in Table 8.3. Each category has a weight of 20% and 
represents one dimension of systemic risk. The size effect (too big too fail) corresponds to 
the first category, but is also present in all other categories. Network effects are reflected in 
category 2 (interconnectedness) and category 4 (complexity). The third category measures 
the degree of critical functions, while the cross-jurisdictional activity tends to identify global 
banks. 


TABLE 8.3: Scoring system of G-SIBs 


Category Indicator Weight 

1 Size 1 Total exposures 1/5 

Ste ee E 2. Intra-financial system assets 1/15 
2 Interconnectedness 3 Intra-financial system liabilities /15 
4 Securities outstanding /15 

ee ae 5 Payment activity = Mts 
3 Substitutability/financial 6 Assets under custody /15 
institution infrastructure 7 Underwritten transactions in - 
debt and equity markets /15 

S 8 Notional amount of OTC derivatives 1/15 | 
4 Complexity 9 Trading and AFS securities /15 
10 Level 3 assets /15 
Mo estado bed es 11 Cross-jurisdictional claims 0 
9: , Gross jurisdictional activity 12 Cross-jurisdictional liabilities 0 


An example of the score computation is given in Table 8.4. It concerns the G-SIB score 
of BNP Paribas in 2014. Using these figures, the size score is equal to: 


2032 


Score = 


20The sample consists of the largest 75 banks defined by the Basel III leverage ratio exposure measure. 


464 Handbook of Financial Risk Management 


TABLE 8.4: An example of calculating the G-SIB score 


Indicator Sample Indicator Category 


value"? total” score”) score”) 


Category Indicator 


Total exposures 

Intra-financial system assets 

Intra-financial system liabilities 

Securities outstanding 

Payment activity 49,557 1,850,755 
Substitutability/financial | Assets under custody 4,181 100,012 
insitution infrastructure | Underwritten transactions in debt and 

equity markets 

Notional amount of OTC derivatives 

Trading and AFS securities 

Level 3 assets 


c ‘urisdictional activit Cross-jurisdictional claims 
cee ee OIE ONS: AGHVISY, Cross-jurisdictional liabilities 


Final score 


The figures are expressed in billion of EUR. 


the figures are expressed in bps. 


Source: BCBS (2014), G-SIB Framework: Denominators; BNP Paribas (2014), Disclosure for 
G-SIBs indicators as of 31 December 2013. 


The interconnectedness score is an average of three indicator scores. We obtain: 


Score = 2 (205, 435 , 314 
te = 3 (7718 " 7831 " 10836 


2.656% + 5.555% + 2.898% 
3 


= 3.70% 


The final score is an average of the five category scores: 


1 
Score = 5 (3.06% + 3.70% + 3.69% + 5.05% + 4.85%) 
= 4.07% 


Depending on the score value, the bank is then assigned to a specific bucket, which is used 
to calculate its specific higher loss absorbency (HLA) requirement. The thresholds used to 
define the buckets are: 


1. 130-229 for Bucket 1 (+1.0% CET1); 


); 
2. 230-329 for Bucket 2 (+1.5% CETL); 
) 


3. 330-429 for Bucket 3 


(4 
(4 
(+2.0% CET1); 
(4 


4. 430-529 for Bucket 4 (+2.5% CET1); 


5. and 530-629 for Bucket 5 (+3.5% CET1). 


For instance, the G-SIB score of BNP Paribas was 407 bps. This implies that BNP Paribas 
belonged to Bucket 3 and the additional buffer was 2% common equity tier 1 at the end of 
2014. 
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In November 2018, the FSB has published the updated list of G-SIBs and the required 
level of additional loss absorbency. There are no banks in Bucket 5. The most G-SIB is 
JPMorgan Chase, which is assigned to Bucket 4 (2.5% of HLA requirement). It is followed 
by Citigroup, Deutsche Bank and HSBC (Bucket 3 and 2.0% of HLA requirement). Bucket 2 
is composed of 8 banks (Bank of America, Bank of China, Barclays, BNP Paribas, Goldman 
Sachs, ICBC, Mitsubishi UFJ FG and Wells Fargo). The 17 remaining banks given in Table 
8.1 on page 460 form Bucket 1. The situation has changed since the first publication in 
November 2011. Generally, the number of G-SIBs is between 28 and 30 banks. Depending 
on the year, the list may include BBVA, ING, Nordea and Royal Bank of Scotland. Since 
2011, we observe that the number of banks in Buckets 4 and 3 generally decreases, while 
the number of banks in Bucket 2 increases. For instance, in November 2015, Buckets 4 and 
3 were composed of two banks (HSBC and JPMorgan Chase) and four banks (Barclays, 
BNP Paribas, Citigroup and Deutsche Bank). 


Remark 90 The FSB and the BCBS consider a relative measure of the systemic risk. They 
first select the universe of the 75 largest banks and then defines a G-SIB as a bank which 
has a total score which is higher than the average score?!. This procedure ensures that there 
are always systemic banks. Indeed, if the scores are normally distributed, the number of 


systemic banks is half the number of banks in the universe. This explains that the number 
of G-SIBs is around 30. 


Roncalli and Weisang (2015) reported the average rank correlation (in %) between the 
five categories for the G-SIBs as of end 2013: 


100.0 

84.6 100.0 

77.7 63.3 100.0 

91.5 94.5 70.1 100.0 

91.4 90.6 84.2 95.2 100.0 


We notice the high correlation coefficients?? between the first (size), second (interconnect- 
edness), fourth (complexity) and fifth categories (cross-jurisdictional activity). This is not 
surprising that G-SIBs are the largest banks in the world. In fact, the high correlation be- 
tween the five measures masks the multifaceted reality of systemic risk. This is explained 
by the homogeneous nature of global systemically important banks in terms of their busi- 
ness model. Indeed, almost all these financial institutions are universal banks mixing both 
commercial and investment banking. 


Besides the HLA requirement, the FSB in consultation with the BCBS has published in 
November 2015 its proposed minimum standard for ‘total loss absorbing capacity’ (TLAC). 
According to FSB (2015d), “the TLAC standard has been designed so that failing G-SIBs 
will have sufficient loss-absorbing and recapitalization capacity available in resolution for 
authorities to implement an orderly resolution that minimizes impacts on financial stability, 
maintains the continuity of critical functions, and avoids exposing public funds to loss”. In 
this context, TLAC requirements would be between 8% to 12%. This means that the total 
capital would be between 18% and 25% of RWA for G-SIBs”* as indicated in Figure 8.2. 


Remark 91 Recently, the scoring system has slightly changed with the addition of a trading 
volume indicator in the third category. The other categories and weights remain unchanged 


21Tt is equal to 104/75 = 133. 

?2The highest correlation is between Category 4 and Category 5 (95.2%) whereas the lowest correlation 
is between Category 2 and Category 3 (63.3%). 

?3Using Table 1.5 on page 21, we deduce that the total capital is equal to 6% of tier 1 plus 2% of tier 2 
plus 2.5% of conservation buffer (CB) plus 1% — 3.5% of systemic buffer (HLA) plus 8% — 12% of TLAC. 
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TLAC 


HLA 


CB 


T2 
ATI 


CET1 


FIGURE 8.2: Impact of the TLAC on capital requirements 


except the indicators underwritten transactions in debt and equity markets and trading vol- 
ume, whose weight is equal to 1/30 (BCBS, 2018). 


8.2.1.2 Identification of G-SIIs 


In the case of insurers, the International Association of Insurance Supervisors (LAIS) 
has developed an approach similar to the Basel Committee to measure global systemically 
important insurers (or G-SIIs). The final score is an average of five category scores: size, 
interconnectedness, substitutability, non-traditional and non-insurance (NTNI) activities 
and global activity. Contrary to the G-SIB scoring system, the G-SII scoring system does 
not use an equal weight between the category scores. Thus, a 5% weight is applied to 
size, substitutability and global activity, whereas interconnectedness and NTNI activities 
represent respectively 40% and 45% of weighting. In fact, the score highly depends on the 
banking activities (derivatives trading, short term funding, guarantees, etc.) of the insurance 
company”*. 


8.2.1.3 Extension to NBNI SIFIs 


In March 2015, the FSB published a second consultation document, which proposed 
a methodology for the identification of NBNI SIFIs. The concerned financial sectors were 
finance companies, market intermediaries, asset managers and their funds. The scoring 
system was an imitation of the G-SIFI scoring system with the same five categories. As 
noted by Roncalli and Weisang (2015), this scoring system was not satisfying, because it 
failed to capture the most important systemic risk of these financial institutions, which is 
the liquidity risk. Indeed, a large amount of redemptions may create fire sales and affect 
the liquidity of the underlying market. This liquidity mainly depends on the asset class. For 
instance, we do not face the same risk when investing in an equity fund and in a bond fund. 
Finally, the FSB has decided to postpone the assessment framework for NBNI G-SIFIs and 
to work specifically on financial stability risks from asset management activities. 


24See IAIS (2013a) on page 20. 
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8.2.2 The academic approach 


Academics propose various methods to measure the systemic risk. Even if they are 
heterogenous, most of them share a common pattern. They are generally based on publicly 
market data?°. Among these different approaches, three prominent measures are particularly 
popular: the marginal expected shortfall, the delta conditional value-at-risk (A CoVaR) and 
the systemic risk measure (SRISK). 


Remark 92 In what follows, we define the different systemic risk measures and derive their 
expression in the Gaussian case. Non-Gaussian and non-parametric estimation methods are 
presented in Chapters 10 and 11. 


8.2.2.1 Marginal expected shortfall 


This measure has been proposed by Acharya et al. (2017). Let w; and L; be the exposure 
of the system to institution 7 and the corresponding normalized random loss. We note 
w = (w1,..., Wn) the vector of exposures. The loss of the system is equal to: 


i=1 


We recall that the expected shortfall ES, (w) with a confidence level a is the expected loss 
conditional that the loss is greater than the value-at-risk VaR (w): 


ES, (w) = E[L| L > VaR, (w)| 


The marginal expected shortfall of institution 7 is then equal to: 


MES; = E n 3 [L; | L > VaRa (w)] (8.3) 
In the Gaussian case (L1,..., Ln) ~N (u, £), we have found that?®: 
-l(a 
MES; = pi 4 “8 Cm . (£w); 
Another expression of MES is then: 
MES; = pi + 8; (w) - (ESa (w) — E (L) ) (8.4) 


where 8; (w) is the beta of the institution loss with respect to the total loss: 


_ cov (L, Li) (w); 
= (L) ww 


pi (w) 


Acharya et al. (2017) approximated the MES measure as the expected value of the stock 
return R; when the return of the market portfolio Rm is below the 5% quantile: 


MES; = —E [R; | Rm < F7! (5%)] 


where F is the cumulative distribution function of the market return Rm. We have: 


1 
MES; = -—— : 
P card (T) 2 it 


25The reason is that academics do not have access to regulatory or private data. 
26See Equation (2.18) on page 107. 
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where T represents the set of trading days, which corresponds to the 5% worst days for 
the market return. Another way of implementing the MES measure is to specify the com- 
ponents of the system and the confidence level a for defining the conditional expectation. 
For instance, the system can be defined as the set of the largest banks and w; is the size of 
Bank i (measured by the market capitalization or the total amount of assets). 


Example 76 We consider a system composed of 3 banks. The total assets managed by these 
banks are respectively equal to $139, $75 and $81 bn. We assume that the annual normalized 
losses are Gaussian. Their means are equal to zero whereas their standard deviations are set 
equal to 10%, 12% and 15%. Moreover, the correlations are given by the following matriz: 


100% 
C= 75% 100% 
82% 85% 100% 


By considering a 95% confidence level, the value-at-risk of the system is equal to $53.86 
bn. Using the analytical results given in Section 2.3 on page 104, we deduce that the 
systemic expected shortfall ESg5., of the entire system reaches the amount of $67.55 bn. 
Finally, we calculate the MES and obtain the values reported in Table 8.5. The MES is 
expressed in %. This means that if the total assets managed by the first bank increases by 
$1 bn, the systemic expected shortfall will increase by $0.19 bn. In the fourth column of the 
table, we have indicated the risk contribution RC;, which is the product of the size w; and 
the marginal expected shortfall MES;. This quantity is also called the systemic expected 
shortfall of institution 2: 


We have also reported the beta coefficient 8;(w) (expressed in bps). Because we have 
Hi = 0, we verify that the marginal expected shortfall is equal to the beta times the systemic 
expected shortfall. 


TABLE 8.5: Risk decomposition of the 95% systemic expected shortfall 


Wi MES; SES; Bi (w) * 
Bank a ba). Gn %)- iin 8 ba). a 2 2 
1 139 19.28 26.80 28.55 0.84 
2 75 22.49 16.87 33.29 0.98 
3 81 29.48 23.88 43.64 1.29 
ES, (w) 67.55 


The marginal expected shortfall can be used to rank the relative systemic risk of a set 
of financial institutions. For instance, in the previous example, this is the third bank that 
is the most risky according to the MES. However, the first bank, which has the lowest 
MES value, has the highest systemic expected shortfall, because its size is larger than the 
two other banks. This is why we must not confuse the relative (or marginal) risk and the 
absolute risk. 


The marginal expected shortfall has been criticized because it measures the systematic 
risk of a financial institution, and not necessarily its systemic risk. In Table 8.5, we give the 
traditional beta coefficient 6; (w*), which is calculated with respect to the relative weights 
wr = w;/ ye wj. As already shown in Equation (8.4), ranking the financial institutions 
by their MES is equivalent to rank them by their beta coefficients. In practice, we can 
nevertheless observe some minor differences because stock returns are not exactly Gaussian. 
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8.2.2.2 Delta conditional value-at-risk 


Adrian and Brunnermeier (2016) define the CoVaR as the value-at-risk of the system 
conditional on some event €; of institution i: 


Pr {L (w) > CoVaR; (E;)} =a 


Adrian and Brunnermeier determine the risk contribution of institution i as the difference 
between the CoVaR conditional on the institution being in distressed situation and the 
CoVaR conditional on the institution being in normal situation: 


A CoVaR; = CoVaR; (D; = 1) — CoVaR; (D; = 0) 


where D; indicates if the bank is in distressed situation or not. Adrian and Brunnermeier 
use the value-at-risk to characterize the distress situation: 


D; = 1 & Li = VaRg (Li) 


whereas the normal situation corresponds to the case when the loss of institution 7 is equal 
to its median?”: 


Finally, we obtain: 
A CoVaR; = CoVaR; (Li = VaRa (Li)) = CoVaR,; (Li =m (Li)) (8.5) 


In the Gaussian case and using the previous notations, we have: 
L; N Hi o? (£w); 
L(w) w'y J? \ Èw); w'Ew 


We deduce that?®: 
L(w) | Li = 4 ~ N (u (ti), 07 (4:)) 


where: u ) 
n (b) = wu t E. (Sw), 
and: , 
Sw): 
o? (L) = w' Sw u); 
Ti 
It follows that: 
CoVaR;(Li=2) = pw(&:)+ 671 (a)a(&) 
li — hi Dw)? 
= w'u+ (4 — Hi) (Sw), + ®t (a) 4/wh dw — ( wi 
0; O% 
Because VaRq (Li) = pi + ®~+ (a) c; and m (Li) = E (Li) = pi, we obtain: 
ACoVaR; = CoVaR; (L; = mi + ®7* (a) oj) — CoVaR; (Li = pi) 
Ew); 
= g-t : ( $ 
(a) 


= ot (a) $ 5 Wjipi jIj 
j=l 


27In this case, we have m (Li) = VaRso% (Li). 
28 We use results of the conditional expectation given in Appendix A.2.2.4 on page 1062. 
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where ;,; is the correlation between banks i and j. Another expression of A CoVaR; is: 


A CoVaR; = ®~' (a) - a? (L). Bi (w) (8.6) 


Oi 


The Gaussian case highlights different properties of the CoVaR measure: 


e Ifthe losses are independent meaning that p;,; = 0, the Delta CoVaR is the unexpected 
loss, which is the difference between the nominal value-at-risk and the nominal median 
(or expected) loss: 


ACoVaR; = ©& '(a)-w;-90; 
Wi: (VaRo (Li) -m (Li)) 
Wi ULa (Li) 


e If the losses are perfectly dependent meaning that p;,; = 1, the Delta CoVaR is the 
sum of the unexpected losses over all financial institutions: 


A CoVaR; = o} (a) 5 WjOj 
j=1 


= X wj: UL, (Lj) 
j=1 


In this case, the Delta CoVaR measure does not depend on the financial institution. 


e The sum of all Delta CoVaRs is a weighted average of the unexpected losses: 


5 ACoVaR; = ~! (a). 3 s W5 Pig 5 


i=1 i=1 j=1 


= o! (a) i S woj 5 Pij 
j=1 i=1 


= n) 5- wj ULa (Ly) 


j=1 


where p; is the average correlation between institution j and the other institutions 
(including itself). This quantity has no financial interpretation and is not a coherent 
risk measure satisfying the Euler allocation principle. 


Remark 93 In practice, losses are approximated by stock returns. Empirical results show 
that MES and CoVaR measures may give different rankings. This can be easily explained 
in the Gaussian case. Indeed, measuring systemic risk with MES is equivalent to analyze 
the beta of each financial institution whereas the Co VaR approach consists of ranking them 
by their beta divided by their volatility. If the beta coefficients are very close, the Co VaR 
ranking will be highly sensitive to the volatility of the financial institution’s stock. 


We consider Example 76 and report in Table 8.6 the calculation of the 95% CoVaR. 
measure. If Bank 1 suffers a loss larger than its 95% value-at-risk ($22.86 bn), it induces a 
Delta CoVaR. of $50.35 bn. This systemic loss includes the initial loss of Bank 1, but also 
additional losses of the other banks due to their interconnectedness. We notice that CoVaR 
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and MES produce the same ranking for this example. However, if we define the systemic 


risk as the additional loss on the other components of the system?, we find that the stress 
on Bank 2 induces the largest loss on the other banks®”. 


TABLE 8.6: Calculation of the 95% CoVaR measure 


Bank Wj VaRa (Li) CoVaR; (£) A CoVaR; 
(in $ bn) (in%) (in $ bn) D;=1 D;=0 (in$bn) 

1 139 16.45 22.86 69.48 19.13 50.35 

2 75 19.74 14.80 7144 22.50 48.94 

3 81 24.67 19.98 67.69 16.37 51.32 


The dependence function between financial institutions is very important when calcu- 
lating the CoVaR measure. For instance, we consider again Example 76 with a constant 
correlation matrix Cs (p). In Figure 8.3, we represent the relationship between A CoVaR; 
and the uniform correlation p. When losses are independent, we obtain the value-at-risk of 
each bank. When losses are comonotonic, A CoVaR; is the sum of the VaRs. Because losses 
are perfectly correlated, a stress on one bank is entirely transmitted to the other banks. 


S9: VaR,(Bank 1) + VaR,(Bank 2) + VaR,(Bank 3) 


ACoVaR 


VaR,(Bank 1) 
20 EVaR,(Bank 3) 7 Pa 


7 
| VaRg(Bank 2) 7 


p (in Z) 


FIGURE 8.3: Impact of the uniform correlation on A CoVaR; 


8.2.2.3 Systemic risk measure 


Another popular risk measure is the systemic risk measure (SRISK) proposed by Acharya 
et al. (2012), which is a new form of the systemic expected shortfall of Acharya et al. (2017) 
and which was originally developed by Brownlees and Engle (2016) in 2010. Using a stylized 


29This additional loss is equal to CoVaR; —w; - VaRa (Li). 
30The additional loss (expressed in $ bn) is equal to 27.49 for Bank 1, 34.13 for Bank 2 and 31.33 for 
Bank 3. 
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balance sheet, the capital shortfall CS; of institution 2 at time t is the difference between 
the required capital 1; , and the market value of equity V;,t: 


CSit = Kit Vit 


We assume that KC; + is equal to k- A;+ where A; is the asset value and k is the capital 
ratio (typically 8% in Basel IT). We also have Aj; = Dit + Vi, where D;4 represents the 
debt value®!. We deduce that: 


CSi = B+ (Dat Vig) — Vit 
k- Diz—-(A—k)- Vit 


We define the capital shortfall of the system as the total amount of capital shortfall CS; 4: 


CS, = 3 CSi, 
i=1 


Acharya et al. (2012) define the amount of systemic risk as the expected value of the capital 
shortfall conditionally to a systemic stress S: 
s| 


E [Dit] S] — (1 — k) -E[Vi eri 8] 


SRISK, = E[CS,41|S] 


= EÈ È CSi ipi 
j=l 


n 


II 


i=l 


They also assume that E [ D; t+1| S] ~ Di and: 


i [Vi t+1| S] = (1 — MES, 4) - Vie 


where MES; + is the marginal expected shortfall conditionally to the systemic risk S. By 
using the leverage ratio L'R; + defined as the asset value divided by the market value of 
equity: 


Vit Vit’ 
they finally obtain the following expression of the systemic risk*?: 


SRISK; = 5 (k-(£Riz — 1) — (1 —k)- (1 — MES;,)) - Viz 


i=l 


We notice that the systemic risk can be decomposed as the sum of the risk contributions 
SRISK; +. We have: 
SRISK; 4 = Vie Vit (8.7) 
with: 
Vit = k- LRit + (1-— k) - MES;;—-1 (8.8) 
In these two formulas, k and MES; are expressed in % while SRISK; ; and V; + are measured 


in $. SRISK;, is then a linear function of the market capitalization V; t, which is a proxy 
of the capital in this model. The scaling factor v; depends on 4 parameters: 


31Here, we assume that the bank capital is equal to the market value, which is not the case in practice. 
32We have Diz = (LRi,t — 1) - Vit 
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1. k is the capital ratio. In the model, we have X; + = k- A; whereas the capital KC; ; is 
equal to k RWA; + in Basel Accords. Under some assumptions, k can be set equal to 
8% in the Basel I or Basel II framework. For Basel III and Basel IV, we must use a 
higher value, especially for SIF Is. 


2. LRi« is the leverage ratio of institution i. The higher the leverage ratio, the higher 
the systemic risk. 


3. The systemic risk is an increasing function of the marginal expected shortfall. Because 
we have MES; ; € [0,1], we deduce that: 


Ge LRit— 1) - Vie < SHIGE < k- (LRit— 1) - Vit 


A high value of the MES decreases the market value of equity, and then the absorbency 
capacity of systemic losses. 


4. The marginal expected shortfall depends on the stress scenario. In the different pub- 
lications on the SRISK measure, the stress S generally corresponds to a 40% drop of 
the equity market: 


MES; = -E [Ri t1 | Rm,t+1 < —40%] 


Example 77 We consider a universe of 4 banks, whose characteristics are given in the 
table below?’ : 


Bank Viz LRit Hi Oi Pim 
1 57 23 0% 25% 70% 
2 65 28 0% 24% 75% 
3 91 13 0% 22% 68% 
4 120 20 0% 20% 65% 


We assume that the expected return um and the volatility Om of the equity market are equal 
to 0% and 17%. 


Using the conditional expectation formula, we have: 


S- m 
ia ausin E g 


We can then calculate the marginal expected shortfall and deduce the scaling factor and 
the systemic risk contribution thanks to Equations (8.7) and (8.8). Results are given in 
Table 8.7. In this example, the main contributor is bank 2 because of its high leverage ratio 
followed by bank 4 because of its high market capitalization. In Table 8.8, we show how the 
SRISK measure changes with respect to the stress S. 


According to Acharya et al. (2012), the most important SIFIs in the United States 
were Bank of America, JPMorgan Chase, Citigroup and Goldman Sachs in 2012. They also 
noticed that four insurance companies were also in the top 10 (MetLife, Prudential Financial, 
AIG and Hertford Financial). Engle et al. (2015) conducted the same exercise on European 
institutions with the same methodology. They found that the five most important SIFIs in 
Europe were Deutsche Bank, Crédit Agricole, Barclays, Royal Bank of Scotland and BNP 
Paribas. Curiously, HSBC was only ranked at the 15‘ place and the first insurance company 
AXA was 16". This ranking system is updated in a daily basis by the Volatility Institute at 


33The market capitalization V; 4 is expressed in $ bn. 
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TABLE 8.7: Calculation of the SRISK measure (S = —40%) 


Sak MESu y SRISK;, 
(in%) ~** (in$bn) (in %) 
1 41.18 1.22 69.47 22.11 
2 42.35 1.63 105.93 33.70 
3 35.20 0.36 33.11 10.54 
4 30.59 0.88 105.77 33.65 


TABLE 8.8: Impact of the stress S on SRISK 


Bank! S=~20% 1 S=-40% =;  S=-60% 
(in $ bn) (in %) ! (in $ bn) (in %) (in $ bn) (in %) 
1 , 58.7 22.6 | 69.5 22.1 | 80.3 21.7 
2 i 93.3 36.0 105.9 33.7 i 118.6 32.1 
3 ıı 184 Ti 33.1 10.5 | 47.8 13.0 
4 ! 889 34.3 ! 105.8 33.7 ! 122.7 33.2 


New York University**. In Tables 8.9, 8.10 and 8.11, we report the 10 largest systemic risk 
contributions by region at the end of November 2015. The ranking within a region seems 
to be coherent, but the difference in the magnitude of SRISK between American, European 
and Asian financial institutions is an issue. 


Remark 94 The main drawback of the model is that SRISK; 4 is very sensitive to the 
market capitalization with two effects. The direct effect (SRISK; + = Vi - Vit) implies that 
the systemic risk is reduced when the equity market is stressed, whereas the indirect effect 
due to the leverage ratio increases the systemic risk. When we analyze simultaneous the two 
effects, the first effect is greater. However, we generally observe an increase of the SRISK, 
because the marginal expected shortfall is much higher in crisis periods. 


8.2.2.4 Network measures 


The previous approaches can help to name systemically important financial institutions. 
However, they cannot help to understand if there is or not a systemic risk. For instance, 
size is not always the right metric for measuring the systemic risk. If we consider the hedge 
fund industry, the three most famous bankruptcies are LTCM in 1998 ($4.6 bn), Amaranth 
in 2006 ($6.5 bn) and Madoff in 2008 ($65 bn). Even if the loss was very large, the Madoff 
collapse could not produce a systemic risk, because it was a Ponzi scheme, meaning that 
Madoff assets were not connected to the market. So there were no feedback and spillover 
effects. In a similar way, the collapse of Amaranth had no impact on the market, except for 
natural gas futures contracts. Therefore, Amaranth was mainly connected to the market via 
CCPs. The case of LTCM is completely different, because LCTM was highly leveraged and 
connected to banks because of interest rate swaps. These three examples show that size is 
not always a good indicator of systemic risk and interconnectedness is a key parameter for 
understanding systemic risk. Another issue concerns the sequence of a systemic crisis. In 
the previous approaches, the origin of a systemic risk is a stress, but there are some events 
that cannot be explained by such models. This is generally the case of flash crashes, for 
example the US Stock Market flash crash of 6 May 2010, the US treasury flash crash of 


34The internet web page is vlab.stern.nyu.edu. 
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TABLE 8.9: Systemic risk contributions in America (2015-11-27) 


LA SRISK; ¢ MES; 4 

Rank institution (in $ bn) Gin %) (in %) LRit 
1 Bank of America 49.7 10.75 2.75 11.42 
2 Citigroup 44.0 9.52 3.23 10.83 
3 JPMorgan Chase 42.6 9.22 3.09 9.74 
4 Prudential Financial 37.6 8.13 3.07 19.64 
5 MetLife 33.9 7.33 2.85 15.40 
6 Morgan Stanley 28.6 6.20 3.50 12.60 
7 Banco do Brasil 24.1 5.22 4.00 29.45 
8 Goldman Sachs 20.3 4.38 3.21 10.51 
9 Manulife Financial 20.1 4.36 3.43 15.04 
10 Power Corp of Canada 16.2 3.50 2.82 26.81 


Source: Volatility Institute (2015), vlab.stern.nyu.edu. 


TABLE 8.10: Systemic risk contributions in Europe (2015-11-27) 


Ree SRISK; + MES; 4 

Rank institution (in $ bn) Gn %) (in %) LRit 
1 BNP Paribas 94.1 8.63 3.42 33.41 
2 Crédit Agricole 88.1 8.09 4.22 59.34 
3 Barclays 86.3 7.92 4.31 36.60 
4 Deutsche Bank 86.1 7.90 4.32 53.61 
5 Société Générale 61.3 5.63 3.85 38.74 
6 Royal Bank of Scotland 39.5 3.63 3.15 24.23 
7 Banco Santander 38.3 3.51 3.79 18.57 
8 HSBC 34.5 3.16 2.49 15.96 
9 UniCredit 33.1 3.04 3.58 27.21 
10 London Stock Exchange 31.3 2.87 2.90 52.67 


Source: Volatility Institute (2015), vlab.stern.nyu.edu. 


TABLE 8.11: Systemic risk contributions in Asia (2015-11-27) 


et ae SRISK; 4 
Rank institution (in $ bn) (in %) 
1 Mitsubishi UFJ FG 121.5 9.45 
2 China Construction Bank 117.3 9.12 
3 Bank of China 94.5 7.35 
4 Mizuho FG 93.7 7.29 
5 Agricultural Bank of China 92.0 7.16 
6 Sumitomo Mitsui FG 85.7 6.67 
7 ICBC 58.4 4.54 
8 Bank of Communications 45.0 3.50 
9 Industrial Bank 29.4 2.29 
10 National Australia Bank 27.4 2.13 


MES; 
(in %) 
2.41 
2.61 
2.53 
2.10 
0.66 
2.71 
0.84 
2.47 
1.38 
3.27 


LRit 


24.80 
17.01 
15.21 
31.84 
19.20 
26.99 
13.80 
16.89 
17.94 
13.48 


Source: Volatility Institute (2015), vlab.stern.nyu.edu. 
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15 October 2014 and the US ETF flash crash of 24 August 2015. These three events have 
been extensively studied by US regulators (FRB, SEC and CFTC). However, they never 
found the original cause of these crashes. In fact, such systemic risk events are generally 
explained by a network risk: small events can propagate in a very dense network in order 
to produce a large risk because of spillover effects. 


Billio et al. (2012) and Cont et al. (2013) introduce network analysis in order to study 
the systemic risk of a financial system. In this case, the nodes of the network correspond 
to financial institutions. Their goal is to measure the connectivity and the centrality of 
each node in the network. For instance, Figure 8.4 represents the network structure of 
the Brazilian banking system estimated by Cont et al. (2013). The idea is to estimate the 
contribution of each node to the loss of the system. In this case, the risk contribution depends 
on the centrality of the node and the density of the network. In particular, they conclude 
that their results “emphasize the contribution of heterogeneity in network structure and 
concentration of counterparty exposures to a given institution in explaining its systemic 
importance”. The method proposed by Billio et al. (2012) is different since it considers 
Granger-causality networks. However, the two approaches pursue the same goal, which is 
to propose a measure of connectedness. 


FIGURE 8.4: Network structure of the Brazilian banking system 


Source: Cont et al. (2013). 


Acemoglu et al. (2015) have studied the impact of the complexity on the interbank 
market. They showed that network density can enhance financial stability when (external) 
shocks are small. But when external shocks are large, a complete network? is more risky 
that a sparse network. This result does not depend on the size of financial institutions. 


351t corresponds to a network where a financial institution is connected to all the other financial institu- 
tions. 
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These different results illustrate that the systemic risk cannot be reduced to the balance 
sheet size of financial institutions, but also depends on the connectedness or the density of 
the network?®. This is why network risks can be an important component of the systemic 
risk. 


FIGURE 8.6: A sparse network 


36Figures 8.5 and 8.6 show two examples of networks. The first one is a completely connected network, 
while the second figure corresponds to a sparse network. 
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8.3 Shadow banking system 


This section on the shadow banking has been included in this chapter together with 
systemic risk because they are highly connected. 


8.3.1 Definition 


The shadow banking system (SBS) can be defined as financial entities or activities 
involved in credit intermediation outside the regular banking system (FSB, 2011; IMF, 
2014b). This non-bank credit intermediation complements banking credit, but is not subject 
to the same regulatory framework. Another important difference is that “shadow banks are 
financial intermediaries that conduct maturity, credit, and liquidity transformation without 
access to central bank liquidity or public sector credit guarantees” (Pozsar et al., 2013). 
In this context, shadow banks can raise similar systemic risk issues than regular banks in 
terms of liquidity, leverage and asset liability mismatch risks. 

However, the main characteristic of shadow banking risk is certainly the high intercon- 
nectedness within shadow banks and with the banking system. If we describe the shadow 
banking system in terms of financial entities, it includes finance companies, broker-dealers 
and asset managers, whose activities are essential for the functioning of the banking system. 
If we focus on instruments, the shadow banking corresponds to short-term debt securities 
that are critical for banks’ funding. In particular, this concerns money and repo markets. 
These linkages between the two systems can then create spillover risks, because stress in 
the shadow banking system may be transmitted to the rest of the financial system (IMF, 
2014b). For instance, run risk in shadow banking is certainly the main source of spillover 
effects and the highest concern of systemic risk. The case of money market funds (MMF) 
during the 2008 GFC is a good example of the participation of the shadow banking to sys- 
temic risk. This dramatic episode also highlights agency and moral hazard problems. Credit 
risk transfer using asset-backed commercial paper (ABCP) and structured investment vehi- 
cles (SIV) is not always transparent for investors of money market funds. This opacity risk 
increases redemption risk during periods of stress (IMF, 2014b). This led the Federal Re- 
serve to introduce the ABCP money market mutual fund liquidity facility (AMLF) between 
September 2008 and February 2010 in order to support MMFs. 


Concepts of shadow banking and NBNI SIFI are very close. To date, the focus was more 
on financial entities that can be assimilated to shadow banks or systemic institutions. More 
recently, we observe a refocusing on instruments and activities. These two approaches go 
together when measuring the shadow banking system. 


8.3.2 Measuring the shadow banking 


FSB (2015e) defines two measures of the shadow banking system. The broad measure 
considers all assets of non-bank financial institutions, while the narrow measure only con- 
siders the assets that are part of the credit intermediation chain. 


8.3.2.1 The broad (or MUNFI) measure 


The broad measure corresponds to the amount of financial assets held by insurance com- 
panies, pension funds and other financial intermediaries (OFI). OFIs comprise all financial 
institutions that are not central banks (CB), banks, insurance companies (IC), pension funds 
(PF), public financial institutions (PFT) or financial auxiliaries (FA). This broad measure 
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is also called the MUNFI°’ measure of assets. Table 8.12 shows the amount of assets man- 
aged by financial institutions and listed in the 2017 monitoring exercise®*. Assets rose from 
$127.8 tn in 2002 to $339.9 tn in 2016. This growth is explained by an increase in all finan- 
cial sectors. In 2016, the MUNFI measure is equal to $159.3 tn representing 46.9% of the 
total assets with the following repartition: $29.1 tn for insurance companies (18.2%), $31.0 
tn for pension funds (19.4%) and $99.2 tn for other financial intermediaries (62.3%). The 
MUMNFI measure is then larger than banks’ assets, which are equal to $137.8 tn in 2016. 


TABLE 8.12: Assets of financial institutions (in $ tn) 


Year CB Banks PFI IC PF OFI FA Total; MUNFI 

2002 4.7 526 11.2 149 11.9 32.4 0.2 127.8! 59.2 46.3% 
2003 55 62.2 12.0 19.3 13.8 39.9 0.3 152.9, 73.0 47.7% 
2004 64 73.1 12.1 22.6 15.3 46.3 0.3 176.0 84.2 47.8% 
2005 68 769 11.9 214 16.5 49.9 0.2 183.7, 87.8 47.8% 
2006 7.7 89.5 11.9 25.3 18.3 60.6 0.3 213.6! 104.2 48.8% 
2007 10.1 110.7 13.0 29.8 19.8 73.4 0.3 257.1 ı 123.0 47.8% 
2008 14.5 123.3 14.2 21.2 194 65.8 0.4 258.9 106.5 41.1% 
2009 15.1 124.1 146 23.7 21.9 706 0.6 270.6 , 116.2 43.0% 
2010 16.7 129.8 148 25.4 244 74.8 0.6 286.5 ! 124.6 43.5% 
2011 20.3 139.2 15.0 26.2 25.4 75.7 0.7 302.5 , 127.3 42.1% 
2012 22.4 143.5 15.0 27.9 274 83.2 0.8 320.1! 138.4 43.3% 
2013 23.0 142.0 14.7 286 289 90.9 0.8 328.9 , 148.4 45.1% 
2014 23.2 138.9 14.7 28.8 29.6 949 0.8 330.9! 153.3 46.3% 
2015 23.6 133.5 15.1 28.2 29.6 94.3 0.7 325.0, 152.0 46.8% 
2016 26.2 137.8 16.0 29.1 31.0 99.2 0.7 339.9! 159.3 46.9% 


Source: FSB (2018a) and author’s calculations. 


Financial assets managed by OFIs are under the scrutiny of the FSB, which has adopted 
the following classification: money market funds (MMF), hedge funds (HF), other invest- 
ment funds?’ (IF), real estate investment trusts and real estate funds (REIT), trust compa- 
nies (TC), finance companies (FC), broker-dealers (BD), structured finance vehicles (SFV), 
central counterparties (CCP) and captive financial institutions and money lenders‘? (CFI). 
Table 8.13 gives the repartition of assets by categories. We can now decompose the amount 
of $99.2 tn assets reached in 2016 by category of OFIs. 38.1% of these assets concern other 
investment funds (equity funds, fixed income funds and multi-asset funds). Broker-dealers 
is an important category of OFIs as they represent 8.8% of assets. It is followed by money 
market funds (5.1%) and structured finance vehicles (4.5%). We notice that the asset man- 
agement industry (money market funds, hedge funds, other investments funds and real 
estate investment companies) represents around 50% of OF Is’ assets. The smallest category 
concerns central counterparties, whose assets are equal to $404 bn in 2016. 

The broad measure suffers from one major shortcoming, because it is an entity-based 
measure and not an asset-based measure. It then includes both shadow banking assets 
and other assets. This is particularly true for equity assets, which are not shadow banking 


387M UNFI is the acronym of ‘monitoring universe of non-bank financial intermediation’. 

38 This exercise covers 29 countries, including for instance BRICS, Japan, the Euro area, the United States 
and the United Kingdom. 

39They correspond to equity funds, fixed income funds and multi-asset funds. 

40They are institutional units that provide financial services, e.g. holding companies used to channel 
financial flows between group entities or treasury management companies. 
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TABLE 8.13: Assets of OFIs (in $ tn) 


Year MMF HF IF REIT TC FC BD SFV CCP CFI Other 
2002 32 00 56 02 00 23 32 23 00 2.0 13.6 
2003 33 00 75 02 00 27 38 27 00 23 173 
2004 34 00 89 04 00 29 46 33 00 27 20.3 
2005 3.4 O00 100 05 00 28 50 42 00 35 20.6 
2006 4.0 0.0 126 06 01 29 57 51 0.0 3.8 26.0 
2007 51 0.5 16.2 06 01 30 65 65 00 44 30.4 
2008 6.0 0.6 17.0 06 03 32 93 64 01 42 18.1 
2009 55 0.6 219 07 05 36 79 90 05 43 16.0 
2010 48 0.8 25.0 08 07 38 87 76 05 46 17.5 
2011 45 15 243 1.0 10 39 91 67 05 44 18.9 
2012 44 25 288 1.8 15 35 93 62 05 4.6 20.5 
2013 45 29 335 14 2.2 33 91 57 04 4.7 23.1 
2014 47 35 353 15 2.7 34 96 5.1 0.4 45 24.1 
2015 51 35 351 15 29 34 87 47 04 45 24.5 
2016 50 3.7 378 16 34 34 87 45 04 5.1 25.7 


Source: FSB (2018a) and author’s calculations. 


TABLE 8.14: Wholesale funding 


Banks OF Is 
2011 2016 2011 2016 
Repo 5.82% 5.52% 6.99% 4.14% 


Funding 


(0% GE balance aheet) ST wholesale 4.74% 5.01% 2.91% 4.04% 


LT wholesale 6.94% 7.03% 9.10% 6.45% 
Assets 3.33 4.16 3.05 4.01 
Liabilities 4.58 4.72 2.92 3.19 
Net position —0.60 —0.58 0.14 0.83 


Repo 
(in $ tn) 


assets“!. In this context, the FSB has developed more relevant measures. In Figure 8.7, we 
have reported the credit assets calculated by the FSB. In 2016, the credit intermediation 
by banks was equal to $92 tn. At the same time, credit assets by insurance companies 
and pension funds (ICPF) were equal to $22 tn, whereas the credit intermediation by OFIs 
peaked at $38 tn. The FSB proposes a sub-decomposition of these credit assets by reporting 
the lending assets (loans and receivables). The difference between credit and lending assets 
is essentially composed of investments in debt securities. This decomposition is shown in 
Figure 8.7. We notice that loans are the main component of banks’ credit assets (76%), 
whereas they represent a small part of the credit intermediation by ICPFs (9%). For OFIs, 
loans explain 40% of credit assets, but we observe differences between OF's’ sectors. Finance 
companies and broker-dealers are the main contributors of lending by OFIs. 


Remark 95 Since 2016, the FSB also monitors the funding liquidity, in particular whole- 
sale funding instruments including repurchase agreements (repo). Some figures are given in 
Table 8.14. Together, short-term wholesale funding and repos represent 10.5% and 8.1% of 
the balance sheet of banks and OFIs. We also notice that OFIs are net providers of cash 
from repos to the financial system, whereas banks are net recipients of cash through repos. 


41 This concerns for instance equity mutual funds and long/short equity hedge funds. 
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FIGURE 8.7: Credit assets (in $ tn) 


Source: FSB (2018a) and author’s calculations. 


8.3.2.2 The narrow measure 


Since 2015, the FSB produces a more relevant measure of the shadow banking system, 
which is called the narrow measure. The narrow measure is based on the classification of the 
shadow banking system by economic functions given in Table 8.15. Each of these economic 
functions involves a shadow banking activity, such as non-bank credit intermediation and/or 
liquidity/maturity transformation and/or leverage. Moreover, an entity may be classified 
into two or more economic functions. 


The first economic function is related to redemption risks and concerns forced liquida- 
tions in an hostile environment. For instance, the lack of liquidity of some fixed income 
instruments implies a premium for the first investors who unwind their positions on money 
market and bond funds. In this case, one can observe a run on such funds exactly like a 
bank run because investors lose confidence in such products and do not want to be the 
last to move. Run risk can then be transmitted to the entire asset class. This risk mainly 
concerns collective investment vehicles, whose underlying assets face liquidity issues (fixed 
income, real estate). The second and fourth economic functions concern lending and credit 
that are conducted outside of the banking system. The third economic function is related 
to market intermediation on short-term funding. This includes securities broking services 
for market making activities and prime brokerage services to hedge funds. Finally, the last 
economic function corresponds to credit securitization. 


The FSB uses these five economic functions in order to calculate the narrow measure 
defined in Figure 8.8. They consider that pension funds and insurance companies are not 
participating to the narrow shadow banking system except credit insurance companies. 
Nevertheless, this last category represents less than $200 bn, implying that the narrow 
measure principally concerns OFIs. Each OFI is classified or not among the five economic 
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TABLE 8.15: Classification of the shadow banking system by economic functions 
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Economic 


Function Definition Typical entity types 
Management of collective in- Fixed-income funds, mixed 
EF1 vestment vehicles with features funds, credit hedge funds, 
that are susceptible to runs real estate funds 
E Re BA nari Fir ompanies, leas- 
Loan provision that is depen- . E ee 
EF2 ing, factoring and consumer 
dent on short-term funding . . 
credit companies 
Intermediation of market ac- 
tivities that is dependent on Broker-dealers, securities 
EF3 . i 
short-term funding or on se- finance companies 
cured funding of client assets o i 
Credit insurance compa- 
EF4 Facilitation of credit creation nies, financial guarantors, 
monolines 
Securitization-based credit in- Securitization vehicles, 
EF5 termediation and funding of fi- structured finance vehicles, 


nancial entities asset-backed vehicles 


Source: FSB (2018a). 


functions by the FSB. For instance, equity funds, closed-end funds without leverage and 
equity REITs are excluded from the shadow banking estimate. Finally, the FSB also removes 
entities that are subsidiaries of a banking group and consolidated at the group level for 
prudential purposes*?. 


TABLE 8.16: Size of the narrow shadow banking (in $ tn) 


Year 2010 2011 2012 2013 2014 2015 2016 
Banks 129.8 139.2 143.5 142.0 138.9 133.5 137.8 
OFIs 74.8 75.7 83.2 90.9 94.9 94.3 99.2 
Shadow banking 284 30.2 32.9 356 39.0 42.0 45.2 


Source: FSB (2018a) and author’s calculation. 


In Table 8.16, we report the size of the narrow shadow banking and compare it with assets 
of banks and OF Is. The narrow measure represents 46% of total assets managed by OFIs. 
These shadow banking assets are located in developed countries, in particular in the United 
States, Japan, Germany, United Kingdom, Canada and France (see Figure 8.9). We also 
notice the weight of China (16%) and the importance of three locations: Cayman Islands, 
Luxembourg and Ireland. These countries are generally used as the domicile of complex 
mutual funds and alternative investment funds. If we analyze the assets with respect to 
economic functions, EF1 represents 71.6% of the assets followed by EF5 (9.6%) and EF3 
(8.4%), meaning that the shadow banking system involves in the first instance money market 
and credit funds that are exposed to run risks, securitization vehicles and broker-dealer 
activities. However, these figures are very different from one country to another country. 


42 This category represents almost 15% of OFIs’ assets. 
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FIGURE 8.8: Calculation of the shadow banking narrow measure 
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FIGURE 8.9: Breakdown by country of shadow banking assets (2016) 
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FSB (2018a) provides also network measures between the banking system and OFIs. For 
that, it estimates the aggregate balance sheet bilateral exposure between the two sectors by 
considering netting exposures within banking groups that are prudentially consolidated: 


e assets of banks to OFIs includes loans to institutions, fixed income securities, reverse 
repos and investments in money market funds and other investment funds; 


e liabilities of banks to OFIs consists of uninsured bank deposits (e.g. certificates of 
deposit, notes and commercial paper), reverse repos and other short-term debt in- 
struments. 


(credit) 
Ror, 


OFT’s funding risk 


Banks OFIj 


ooo 
OFI’s credit risk 


(credit) 
Reank; 


Bank’s credit risk 


Bank; OFIs 
eo 
Bank’s funding risk 


(funding) 
RBank; 


FIGURE 8.10: Interconnectedness between banks and OFIs 


Linkages between banks and OFIs are represented in Figure 8.10. These linkages measure 
the interconnectedness between a set i € Z of banks and a set j € J of OFIs. Let Agank; and 
Aor, be the total amount of assets managed by bank i and OFI j. We note ABank; >oF1, and 
LBank;=oF1; the assets and liabilities of bank 7 to OFI j, and AoFI,;—Bank; and Lort;—Bank; 
the assets and liabilities of OFI j to bank i. By construction, we have Agank,oFI,; = 
LOFIj—>Bank; and LBank; > OFI; = AOFI; Bank; - In the bottom panel, we have represented 
the linkage from the bank’s perspective. In this case, the credit and funding risks of bank i 
are equal to: 
a (credit) — ABank; —OFIs 


Bank; 
iii ABank; 
and: 
R (funding) = Dank; OFIs 
Bank; = A 
Bank; 
where the aggregate measures are equal to Apank;>OFIs = Ee J ABank;+OFI; and 


LBank;>0FIs = >> JET LBank;—=>0OF1;- In the same way, we can calculate the interconnect- 
edness from the OFT’s viewpoint as shown in the top panel. As above, we define the credit 
and funding risks of OFI j in the following way: 


R (credit) _ AOFI; — Banks 
OFI; 


AOFI; 
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and: : 
(funding) OFI; —>Banks 
ROF, e = Aa. 

OFI; 
where Aort,+Banks = J iez AOFI;+Bank; and Lort,+Banks = J jez LOFI;+Bank;- Using 
the Shadow Banking Monitoring Dataset 2017, FSB (2018a) finds the following average 


interconnectedness ratios: 


. (credit) (funding) (credit) (funding) 
Ratio | RBanks R Banks Rorts Rots 


2008 6.8% 6.7% 9.5% 9.8% 
2016 5.6% 5.4% 6.3% 6.7% 


This means that 5.4% of bank’s funding depends on the shadow banking system, while the 
credit risk of banks to OFIs is equal to 5.6% of bank’s assets. We also notice that 6.7% of 
OF Is’ assets are provided by banks, while investments of banks into OFIs reaches 6.3%. It is 
interesting to compare these figures with those during or before the 2008 Global Financial 
Crisis. We observe that the interconnectedness between banks and OFIs has decreased. For 
example, the OFI use of funding from banks was 9.8%, while the bank use of funding from 
OFIs was 6.7%. These figures give an overview of the linkages between banking and OFI 
sectors. In practice, the interconnectedness is stronger because these ratios are calculated 
by netting exposures within banking groups. It is obvious that linkages are higher in these 
entities. 


8.3.3 Regulatory developments of shadow banking 


The road map for regulating shadow banking, which is presented in FSB (2013), focuses 
on four key principles: 


e measurement and analysis of the shadow banking; 
e mitigation of interconnectedness risk between banks and shadow banking entities; 
e reduction of the run risk posed by money market funds; 


e and improvement of transparency in securitization and more generally in complex 
shadow banking activities. 


8.3.3.1 Data gaps 


As seen in the previous section, analyzing the shadow banking system is a big challenge, 
because it is extremely difficult to measure it. In order to address this issue, FSB and 
IMF are in charge of the implementation of the G-20 data gaps initiative (DGI). DGI 
is not specific to shadow banking, but is a more ambitious program for monitoring the 
systemic risk of the global financial system**. However, it is obvious that shadow banking 
begins to be an important component of DGI. This concerns in particular short-term debt 
instruments, bonds, securitization and repo markets. Trade repositories, which collect data 
at the transaction level, complete regulatory reporting to understand shadow banking. They 
already exist for some OTC instruments in EU and US, but they will certainly be expanded 
to other markets (e.g. collateralized transactions). Simultaneously, supervisory authorities 
have strengthened regulatory reporting processes. However, the level of transparency in the 
shadow banking had still not reach this in banks. Some shadow banking sectors, in particular 
asset management and pension funds, should then expect new reporting requirements. 


43 For instance, DGI concerns financial soundness indicators (FSI), CDS and securities statistics, banking 
statistics, public sector debt, real estate prices, etc. 
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8.3.3.2 Mitigation of interconnectedness risk 


BCBS (2013c) has introduced new capital requirements for banks’ equity investments 
in funds that are held in the banking book. They concern investment funds, mutual funds, 
hedge funds and private equity funds. The framework includes three methods to calculate 
the capital charge: the fall-back approach (FBA), the mandate-based approach (MBA) and 
the look-through approach (LTA). In this latter approach, the bank determines the risk- 
weighted assets of the underlying exposures of the fund. This approach is less conservative 
than the two others, but requires the full transparency on the portfolio holdings. Under 
the fall-back approach, the risk weight is equal to 1250% whatever the risk of the fund. 
According to the BCBS (2013c), the hierarchy in terms of risk sensitivity between the three 
approaches was introduced to promote “due diligence by banks and transparent reporting 
by the funds in which they invest”. This framework had a significant impact on investment 
policy of banks and has reduced investments in equity funds and hedge funds. 


BCBS (2014c) has developed new standards for measuring and controlling large expo- 
sures to single counterparties. This concerns different levels of aggregation from the legal 
entity to consolidated groups. The large exposures framework is applicable to all interna- 
tional banks, and implies that the exposure of a bank to a consolidated group must be 
lower than 25% of the bank capital. This figure is reduced to 15% for systemic banks. This 
framework penalizes then banking groups, which have shadow banking activities (insurance, 
asset management, brokerage, etc.). 


8.3.3.3 Money market funds 


Money market funds are under the scrutiny of regulatory authorities since the September 
2008 run in the United States. The International Organization of Securities Commissions 
(2012a) recalled that the systemic risk of these funds is explained by three factors: 


1. the illusory perception that MMFs don’t have market and credit risks and benefit 
from capital protection; 


2. the first mover advantage, which is pervasive during periods of market distress; 
3. and the discrepancy between the published NAV and the asset value. 


In order to mitigate these risks, IOSCO (2012a) proposed several recommendations con- 
cerning the management of MMFs. In particular, they should be explicitly defined, the 
investment universe should be restricted to high quality money market and low-duration 
fixed income instruments, and they should be priced with the fair value approach. Moreover, 
MMFs that maintain a stable NAV (e.g. 1$ per share) should be converted into floating NAV. 


In September 2015, the IOSCO reviewed the implementation progress made by 31 ju- 
risdictions in adopting regulation and policies of MMFs. In particular, this review concerns 
the five largest jurisdictions (US, Ireland, China, France and Luxembourg), which together 
account for 90% of global assets under management in MMFs. At that time, only the US 
reported having final implementation measures in all recommendations, while China and 
Europe were in the process of finalizing relevant reforms. In July 2014, the US Securities 
and Exchange Commission adopted final rules for the reform of MMFs. In particular, insti- 
tutional MMFs will be required to trade at floating NAV. Moreover, all MMFs may impose 
liquidity fees and redemption gates during periods of stress. In China, the significant growth 
of the MMF market has forced the Chinese regulator to introduce a number of policy mea- 
sures in February 2016. This concerns accounting and valuation methods, redefinition of 
the investment universe, liquidity management and responsibilities of fund managers. In 
Europe, new rules are applied since July 2017. They distinguish three categories of MMFs: 
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variable NAV, public debt CNAV and low volatility NAV. These new rules include liquidity 
management (liquidity fees and redemption gates), prohibit the use of sponsor support and 
redefine the universe of eligible assets. 


8.3.3.4 Complex shadow banking activities 


We give here some supervisory initiatives related to some shadow banking activities. In 
2011, the European Union has adopted the Alternative Investment Fund Managers Directive 
(AIFMD), which complements the UCITS directive for asset managers and applies to hedge 
fund managers, private equity fund managers and real estate fund managers. In particular, it 
imposes reporting requirements and defines the AIFM passport. In a similar way to MMFs, 
IOSCO (2012b) published recommendations to improve incentive alignments in securitiza- 
tion, in particular by including issuer risk retention. According to IMF (2014a), Nomura 
and Daiwa, which are the two largest securities brokerage in Japan, are now subject to Basel 
II capital requirements and bank-like prudential supervision. New regulation proposals on 
securities financing transactions (SFT) have been done by the European Commission. They 
concern reporting, transparency and collateral reuse of SFT activities (repo market, secu- 
rities lending). These few examples show that the regulation of the shadow banking is in 
progress and non-bank financial institutions should expect to be better controlled in the 
future. 
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Chapter 9 


Model Risk of Exotic Derivatives 


In Chapter 2, we have seen that options and derivative instruments present non-linear risks 
that are more difficult to assess and measure than for a long-only portfolio of stocks or 
bonds. Moreover, those financial instruments are traded in OTC markets, meaning that 
their market value is not known with certainty. These issues imply that the current value is 
a mark-to-model price and the risk factors depend on the pricing model and the underlying 
assumptions. The pricing problem is then at the core of the risk management of derivative 
instruments. However, risk management of such financial products cannot be reduced to a 
pricing problem. Indeed, the main difficulty lies in managing dynamically the hedging of the 
option in order to ensure that the replication cost is equal to the option price. In this case, 
the real challenge is the model risk and concerns three levels: the model risk of pricing the 
option, the model risk of hedging the option and the discrepancy risk between the pricing 
model and the hedging model. Therefore, this chapter cannot be just a catalogue of pricing 
models, but focuses more on pricing errors and hedging uncertainties. 


9.1 Basics of option pricing 


In this section, we present the basic models that are used for pricing derivatives instru- 
ments: the Black-Scholes model, the Vasicek model and the HJM model. While the first one 
is general and valid for all asset classes, the last two models concern interest rate derivatives. 


9.1.1 The Black-Scholes model 
9.1.1.1 The general framework 


Black and Scholes (1973) assume that the dynamics of the asset price S (t) is given by 
a geometric Brownian motion: 


{ dS (t) = uS (t) dt + oS (t) dW (¢) (9.1) 


S (to) = So 


where So is the current price, p is the drift, o is the volatility of the diffusion and W (t) 
is a standard Brownian motion. We consider a contingent claim that pays f (S (T)) at the 
maturity T of the derivative contract. For example, if we consider an European option with 
strike K, we have f (S (T)) = (S(T) — K)*. 

Under some conditions, we can show that this contingent claim may be replicated by a 
hedging portfolio, which is composed of the asset and a risk-free asset, whose instantaneous 
return is equal to r(t). The price V of the contingent claim is then equal to the cost of 
the hedging portfolio. In this case, Black and Scholes show that it is the solution of the 


491 


492 Handbook of Financial Risk Management 


following backward equation: 


{ $07 8 0ZV (t, S) + (u— A(t) o) SOsV (t, S) + OV (t, S) —r(t)V (,S) =0 
V (T, S (T)) = f (S (T)) 


This equation is called the fundamental pricing equation. The function A (t) is interpreted 
as the risk price of the Wiener process W (t). For an asset whose cost-of-carry is equal to 
b(t), we have: 
a(t) = HE b(t) 
o 
The previous equation then becomes: 


{ 407 80V (t, S) + b(t) SOsV (t, S) + &V (t, 5) —r(t) V(t, 5) =0 
V (T, S (T)) = f (S (T)) 
The current price of the derivatives contract is obtained by solving this partial differential 
equation (PDE) and to take V (to, So). 

A way to obtain the solution is to apply the Girsanov theorem! to the SDE (9.1) with 
g(t) = —A(t). It follows that: 

{ dS (t) = b(t) S(t) dt + aS (t) dW® (t) 
S (to) = So 


(9.2) 


(9.3) 


where W® (t) is a Brownian motion under the probability Q defined by: 


dQ t 1 t2 

r (- saat) dW (s) — z Soà (s) as) 

We may then apply the Feynman-Kac formula? with h (t,x) = r(t) and g (t,x) = 0 to 
obtain the martingale solution?: 


Vy =E? E ROLIT m) r| (9.4) 


Remark 96 Q is called the risk-neutral probability (or martingale) measure, because the 
option price Vo is the expected discounted value of the payoff". 
9.1.1.2 Application to European options 
We consider an European call option whose payoff at maturity is equal to: 
c (T) = (S (T) - K)* 


We assume that the interest rate r(t) and the cost-of-carry parameter b(t) are constant. 
Then we obtain: 


Co = E? [e W (ger) — Kt 


7 


+ 


Że TE | (Saele-de"}rewsin _ K) 


= ett ‘pe (Seti) P+evT= _ K) (x) dz 
= Sye-T@ (d1) — Ke~"? ® (də) (9.5) 


1See Appendix A.3.5 on page 1072. 

?See Appendix A.3.4 on page 1070. 

3We assume that the current date to is equal to 0. 
4See Exercise 9.4.1 on page 593 for more details. 
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where: 


1 So ) 1 
di = In HOT ) + Ż0ovT 
j oVT ( K 
dy = dı =p VT 
Let us now consider an European put option with the following payoff: 


P(T)=(K-8(T))* 


We have: 
C(T)-P(T) = (S(T)-K)* -(K -8S (T))* 
= S(T)-K 
We deduce that: 
Co-Po = E®le i (S(T) — K) Fr 


= EL [eS (T)| Fo] -Ke"™ 


Soet -T Z Keo? 


This equation is known as the put-call parity. It follows that: 


Po = Co - Soet -T + Kev? 
= —Sye%- TS (—d,) + Ke"? o (—d2) (9.6) 


Remark 97 Equations (9.5) and (9.6) are the famous Black-Scholes formulas. Generally, 
they are presented with b = r, that is for physical assets not paying dividends. The cost-of- 
carry concept is explained in the next paragraphs. 


We consider a call option on an asset, whose cost-of-carry is equal to 5%. We also assume 
that the interest rate is equal to 5%. Figure 9.1 represents the option premium with respect 
to the current value So of the asset. We notice that the price of the call option increases 
with the current price So, the volatility o and the maturity T. In Figure 9.2, we report the 
option premium of the put option. In both cases, it may be interesting to decompose the 
option premium into two components: 


e The intrinsic value is the value of exercising the option now: 
IV (t) = f (So) 


For instance, the intrinsic value of the call option is equal to (So — K i If the intrinsic 
value is positive, the option is said in-the-money (ITM). If the intrinsic value is equal 
to zero, the option is at-the-money (ATM) or out-of-the-money (OTM). 


e The time value is the difference between the option premium and the intrinsic value: 
TV (t) = V (to, So) — IV (t) 


This quantity is always positive and is related to the risk that the intrinsic value will 
increase with the time-to-maturity. 
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FIGURE 9.1: Price of the call option 


N. — K=100 o=20% T=1W 
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«= K=110 o=20% T=6M 


FIGURE 9.2: Price of the put option 
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9.1.1.3 Principle of dynamic hedging 


Self-financing strategy We consider n assets that do not pay dividends or coupons 
during the period [0, T] and we assume that the price vector S (t) follows a diffusion process. 
For asset i, we have then: 


Si (t) = S; (0) +f ui (u) du + J ci (u) dW; (u) 


0 


We set up a trading portfolio (1 (t),...,@n (t)) invested in the assets (S1 (t),...,S, (#)). 
We note X (t) the value of this portfolio: 


We say that the portfolio is self-financing if the following conditions hold: 


f aX Dhat asto 
X (0) =0 


The first condition means that all trades are financed by selling or buying assets in the 
portfolio, whereas the second condition implies that we don’t need money to set up the 
initial portfolio. This implies that: 


Xo+ D> f dC) asi) 


= La Ms+d f ow as(w 


X (t) 


II 


In the Black-Scholes model, we consider a stock that does not pay dividends or coupons 
during the period [0,T] and we assume that its price process S(t) follows a geometric 
Brownian motion: 


dS (t) = uS (t) dt+ aS (t) dW (t) 
We also assume the existence of a risk-free asset B (t) that satisfies: 
dB (t) = rB (t) dt 


We set up a trading portfolio (¢ (t) , y (t)) invested in the stock S (t) and the risk-free asset 
B (t). We note V (t) the value of this portfolio: 


VHQ=o@)SH+YOBEO 


We now form a strategy X (t) in which we are long the call option C (t, S (t)) and short the 
trading portfolio V (t): 


X(t) = C(t,S(t))-V@) 


Using Ité’s lemma, we have: 
dX(t) = OsC(t,S (t)) dS (t) + 
(ac (t,. 5 (t)) + los? HBCU,S «)) aes 
o(t) AS (£) — v(t) dB (8) 
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By assuming that ¢ (t) = OsC (t, S (t)), we obtain: 


X (t) is self-financing if dX (t) = 0 or: 


AC (t, S (t)) + 507? (t) 2c (t, 5 (t)) 
ee 7B (t) 


We deduce that: 


C (t, S (t)) o (t) S(t) +y (t) B (t) 
OsC (t, S (t)) S (t) + 
aC (t, S (t)) + los? (t) ZC (t, S (0) 


rB (t) 


This implies that C (t, S (t)) satisfies the following PDE: 
1 
-0° 8° 0}C (t, S) + rSOsC (t, S) + 2C (t, S) — re (t, S) = 0 


Since X (t) is self-financing (X (t) = 0), we also deduce that the trading portfolio V (t) is 
the replicating portfolio of the call option: 


Vit) = (t) S) +y (t) BY 
= C(t,S(t))-X(t) 
= C(t, S(t) 


If we define the replicating cost as follows: 


ca = few dS (u + f w ipti 


T: 
= a oS (u) $ (u) AW (u) 


T 


C(t) = f uS (u) sC (u, S (u)) du +f aS (u) OsC (u, S (u)) dW (u) 


0 
[ | (ac (m — (u) BC (u 5 (0) ) du 
= dC (u, S (u 
= C(t, 9 (t)) — C (0, So) 


We verify that the replicating cost is exactly equal to the P&L of the long exposure on the 
call option. 
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Cost-of-carry When the stock does not pay dividends, the cost-of-carry parameter b is 
equal to the interest rate r. Let us now consider a stock that pays a continuous dividend 
yield 6, the self-financing portfolio is: 


X (t) =C (t, S (t)) — 9 ©) S(t) — Y (t) B(t) 
We deduce that the change in the value of this portfolio is: 
dX (t) = dC (t, S (t)) — ¢ (t) dS (t) — y (t) dB (t) — ¢ (t) -ô - S (t) dt 
dividend 


Using the same rationale than previously, we obtain ¢ (t) = OsC (t, S (t)) and: 


AC (t, S (t)) + 507s (t) BZC (t, S (©) — 65 (t) ƏsC (t, S (£) 
i= rB (8) 


Finally, we obtain the following PDE: 
1 
57 S Ose (t, S) + (r — 6) SOsC (t, S) + 2C (t, S) — re (t, S) =0 


The cost-of-carry parameter b is now equal to r — 6. It is the percentage cost required 
to carry the asset. Generally, the cost is equal to the interest rate r, but a continuous 
dividend reduces this cost. In the case of futures or forward contracts, the cost-of-carry is 
equal to zero. Indeed, the price of such contracts already incorporates the cost-of-carry of 
the underlying asset. For currency options, the cost-of-carry is the difference between the 
domestic interest rate r and the foreign interest rate r*. 


TABLE 9.1: Impact of the dividend on the option premium 


Put option Call option 
So /6} 0.00 0.02 0.05 0.07] 0.00 0.02 0.05 0.07 
90 1.28 144 1.73 1.94] 13.50 12.67 11.48 10.72 
100 442 483 550 5.97] 689 631 5.50 5.00 
110 | 10.19 10.87 11.91 12.63] 2.91 2.59 2.16 1.90 


In order to illustrate the impact of the cost-of-carry, we have calculated the option 
premium in Table 9.1 with the following parameters: K = 100, r = 5% and a six-month 
maturity. In the case of the put option, the price increases with the dividend yield 6 whereas 
it decreases in the case of the call option. In order to understand these figures, we have to 
come back to the definition of the replicating portfolio. A call option is replicated using a 
portfolio that is long on the asset. This implies that the replicating portfolio benefits from 
the dividends paid by the asset. The self-financing property of the strategy induces that we 
have to borrow less money. This is why the premium of the call option is lower when the 
asset pays a dividend. For the put option, this is the contrary. The replicating portfolio is 
short on the asset. Therefore, it does not receive the dividends, but pays them. 


Remark 98 The value of dividends is an example of model risk. Indeed, future dividends 
are uncertain, meaning that there is a risk of undervaluation of the option premium. In 
the case of a call option, the risk is to use expected dividends that are higher than realized 
values. In the case of put option, the risk is to use low dividends. 


498 Handbook of Financial Risk Management 


Delta hedging The Black-Scholes model assumes that the replicating portfolio is rebal- 
anced continuously. In practice, it is rebalanced at some fixed dates t;: 


O=to <t <- <tr =T 
At the initial date, we have: 
X (to) = C (to, S (to)) — V (to) = 0 

where: 

V (to) = ¢ (to) : S (to) + Y (to) - B (to) 
Because we have ¢ (to) = A (to) and X (to) = 0, we deduce that: 

Y (to) = C (to, S (to)) — A (to) S (to) 
At time tı, the value of the replicating portfolio is then equal to: 

V (t1) = A (to) S (t1) + (C (to, S (to)) — A (to) S (to)) - (1 +r (to) (tı — to)) (9.7) 


It follows that: 
X (t1) =C (t1, S (t1)) — V (tı) 


Therefore, we are note sure that X (t1) = 0 because it is not possible to hedge the jump 
S (ti) — S (to). We rebalance the portfolio and we have: 


V (ti) = ọ (t1): S (t1) + 4 (ti): B (tr) 


We deduce that: 
o (tı) = A (tı) 


and: 
Y (t1) = V (t1) — A (t1) S(t) 


At time t2, the value of the replicating portfolio is equal to: 
V (t2) = A (t1) S (t2) + (V (t1) — A (t1) S (t1)) : (1 +r (t1) (t2 — t1)) (9.8) 


Equation (9.8) differs from Equation (9.7) because we don’t have V (t1) = C (t1, S (t1)) 
More generally, we have: 
X (ti) =C (ti, S (ti)) — V (ti) 


and: 
V (ti) = A (ti-1) S (ti) + (V (ti-1) — A (ti-1) S (ti-1)) (1 + r (ti-1) (ti — ti-1)) 
Vs (ti) ~ Vp (ti) 


where Vg (t;) is the component due to the delta exposure on the asset and Vz (t;) is the 
component due to the cash exposure on the risk-free bond. We notice that: 
Vs (ti) = A(ti-1)-S (ti) 
= A(tiii-1)-S (4-1): (1+ Rs (ti-1;t:)) 


5Without any loss of generality, we take the convention that B (t;) = 1. 
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and: 


Ve (ti) = (V (ti-1)— A (ti-1)-S (ti-1)) (1 + 7 (ti-1) > (ti — ti-1)) 
= (V (ti-1)— A (ti-1)- S (ti-1)) (1 + Re (ti-1; ti) 


where Rs (ti-1; ti) and Rpg (ti—1; ti) are the asset and bond returns between ¢;_1 and t;. At 
the maturity, we obtain: 


X(T) = X(tn) 
= (S(T)- K)" -V (ta) 
H (T) = —X (T) is the P&L of the delta hedging strategy. To measure its efficiency, we 
consider the ratio m defined as follows: 
1 (T) 
C (to, S (to)) 
Example 78 We consider the replication of 100 ATM call options. The current price of the 


asset is 100 and the maturity of the option is 20 weeks. We consider the following parameter: 
b =r = 5% and o = 20%. We rebalance the replicating portfolio every week. 


Since the maturity T is equal to 20/52 and the strike K is equal to 100, the current value 
C (to, S (to)) of the call option is equal to $5.90. The replicating portfolio is rebalanced at 
times t;: 
52 


In Table 9.2, we have reported a simulated path of the underlying asset. We have S (to) = 
100, S (t1) = 95.63, S (t2) = 95.67, etc. At the maturity date, the price of the underlying 
asset is equal to 101.83. In the Black-Scholes model, the delta is equal to: 


A (t) = e@-)F—-9 6 (d1) 


where: 


= oyT=-t 
At each rebalancing date ti—1, we compute the delta A (ti—1) with respect to the price 
S(t;-1) and the remaining maturity T — t;_1. We can then deduce the values of Vs (ti), 
Vz (t;) and V (t;). We can also calculate the new value C (t;, S (t;)) of the call option and 
compare it with V (t;) in order to define X (t;) and H (ti) = —X (ti). We obtain II (T) = 
—29.76, implying that: 


dı = : (m= + b(T D) tovT -t 


O 2976 
T= 00x 5.90 


In this case, the delta hedging strategy has produced a negative P&L. If we consider another 
path of the underlying asset, we can also obtain a positive P&L (see Table 9.3). 


—5.04% 


We now assume that S(t) is generated by the risk-neutral SDE: 
dS (t) = r8 (t) dt + oS (t) dw® (t) 


We estimate the probability density function of m by simulating 10000 trajectories of the 
asset price and calculating the final P&L of the delta hedging strategy. We consider the 
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TABLE 9.2: An example of delta hedging strategy (negative P&L) 


i oti S(t) A(ti-1) Vs(ti)  Velti) V(t) C(t S(t:)) X(t) H(t) 
0 0.00 100.00 0.00 0.00 590.90 590.90 590.90 0.00 0.00 
1 0.02 95.63 58.59 5603.15 —5273.36 329.79 350.22 20.43 —20.43 
2 0.04 95.67 43.72 4182.80 —3854.96 327.84 336.15 8.31 —8.31 
3 0.06 94.18 43.24 4072.36 —3812.62 259.75 260.57 0.82 —0.82 
4 0.08 92.73 37.29 3457.72 —3255.16 202.55 196.22 —6.33 6.33 
5 0.10 96.59 31.34 3027.23 —2706.31 320.93 326.47 5.54 —5.54 
6 0.12 101.68 44.63 4537.99 —3993.73 544.26 582.71 38.45 —38.45 
7 0.13 101.41 63.39 6428.19 —5906.72 521.47 545.64 24.17 —24.17 
8 0.15 100.22 62.36 6249.97 —5808.29 441.68 453.62 11.94 —11.94 
9 0.17 99.32 57.57 5718.25 —5333.51 384.74 382.58 —2.16 2.16 
10 0.19 101.64 53.46 5433.52 —4929.49 504.03 495.99 —8.04 8.04 
11 0.21 101.81 63.27 6441.30 —5932.22 509.08 483.87 —25.21 25.21 
12 0.23 102.62 64.10 6578.19 —6022.97 555.22 513.53 —41.69 41.69 
13 0.25 107.56 67.97 7311.26 —6426.42 884.84 876.68 —8.16 8.16 
14 0.27 102.05 86.90 8867.94 -—8470.05 397.89 424.07 26.18 —26.18 
15 0.29 100.88 66.19 6677.01 —6362.67 314.34 321.76 7.41 —7.41 
16 0.31 106.90 59.86 6399.37 —5730.15 669.21 756.02 86.80 —86.80 
17 0.33 107.66 90.82 9723.75 —8994.54 729.22 806.47 77.25  —T77.25 
18 0.35 101.79 94.74 9643.97 —9480.00 163.96 276.24 112.27 —112.27 
19 0.37 101.76 69.88 7111.04 -—6955.85 155.19 228.08 72.89 —72.89 
20 0.38 101.83 75.10 7647.28 —7494.04 153.24 183.00 29.76 —29.76 
TABLE 9.3: An example of delta hedging strategy (positive P&L) 
i ti S(t) A(tiz1) Vsti) Vet) Viti) Cts ti) X(t) I(t) 
0 0.00 100.00 0.00 0.00 590.90 590.90 590.90 0.00 0.00 
1 0.02 98.50 58.59 5771.31 —5273.36 497.95 489.70 —8.25 8.25 
2 0.04 97.00 53.45 5184.51 —4771.31 413.19 396.75 —16.44 16.44 
3 0.06 95.47 47.89 4571.99 —4236.14 335.85 311.62 —24.24 24.24 
4 0.08 98.17 41.87 4110.19 —3664.81 445.38 419.94 —25.44 25.44 
5 0.10 100.48 51.10 5134.88 —4575.85 559.03 528.68 —30.35 30.35 
6 0.12 102.92 59.19 6092.33 —5394.04 698.28 664.00 —34.29 34.29 
7 0.13 105.50 67.69 7140.94 —6274.05 866.89 829.99 —36.90 36.90 
8 0.15 101.81 76.13 7750.53 —7171.44 579.09 550.21 —28.88 28.88 
9 0.17 100.65 63.86 6427.97 —5928.66 499.31 457.48 —41.83 41.83 
10 0.19 98.86 59.15 5847.59 —5459.40 388.19 337.04 —51.15 51.15 
11 0.21 99.26 50.91 5053.11 —4649.03 404.09 335.31 —68.78 68.78 
12 0.23 101.78 52.25 5317.65 —4786.50 531.15 458.03 —73.12 73.12 
13 0.25 99.28 64.14 6367.78 —6002.74 365.03 288.19 —76.84 76.84 
14 0.27 99.19 51.19 5077.96 —4722.07 355.89 257.52 —98.36 98.36 
15 0.29 95.53 49.97 4773.36 —4604.77 168.59 92.40 —76.18 76.18 
16 0.31 98.02 26.47 2594.85 —2362.61 232.23 148.05 —84.19 84.19 
17 0.33 97.03 39.61 3843.35 —3653.84 189.51 83.97 —105.54 105.54 
18 0.35 96.64 29.34 2835.17 —2659.65 175.51 44.51 —131.01 131.01 
19 0.37 95.01 21.11 2005.37 —1866.05 139.32 3.75 — 135.56 135.56 
20 0.38 93.67 3.62 338.73 —204.45 134.27 0.00 —134.27 134.27 
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previous example, but the maturity is now fixed at 130 trading days®. Figure 9.3 repre- 
sents the density function for different fixed rebalancing frequencies’. We notice that 7 is 
approximately a Gaussian random variable, which is centered around 0. However, the vari- 
ance depends on the rebalancing frequency. In Figure 9.4, we have reported the relationship 
between the hedging efficiency o (7) and the rebalancing frequency. We confirm that we can 
perfectly replicate the option with a continuous rebalancing. 


J \ — dt = 26 days 
IT) === dt = 13 days 
| \ — — dt = 10 days 
6 \ @— dt = 5 days 
l ā--. dt = 2 days 
| \ de — dt = 1 day 


—50 50 
n (in 7) 


FIGURE 9.3: Probability density function of the hedging ratio 7 


Let us now understand how the hedging ratio is impacted by the dynamics of the un- 
derlying asset. We consider again the previous example and simulate one trajectory (see the 
first panel in Figure 9.5). We hedge the call option every half an hour. At the maturity, the 
hedging ratio is equal to 1.8%. The maximum is reached at time t = 0.466 and is equal to 
3.5%. We now introduce a jump at time t = 0.25. This jump induces a large negative P&L 
for the trader, whatever the sign of the jump (see the second and third panels in Figure 
9.5). If we introduce a jump later at time t = 0.40, the cost depends on the magnitude 
and the sign of the jump (Figure 9.6). A positive jump has no impact on the cost of the 
replicating portfolio, whereas a negative jump has an impact only if the jump is very large. 
To understand these results, we have to analyze the delta coefficient. At time t = 0.40, the 
option is in-the-money and the delta is close to 1. This implies that a positive jump has 
low impact on the delta hedging, because the delta is bounded by one. If there is a negative 
jump, the impact is also limited because the delta is lowly reduced. However, in the case of 
a high negative jump, the impact may be important because the delta can be dramatically 
reduced. We also observe the same results when the option is highly out-of-the-money and 
the delta is close to zero. In this case, a negative jump has no impact, because it decreases 


6We assume that a year corresponds to 260 trading days. This implies that the maturity of the option 
is exactly one-half year. 
TWe note ti — t;_1 = dt. 
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FIGURE 9.4: Relationship between the hedging efficiency o (7) and the hedging frequency 


the delta but the delta is bounded by zero. Conversely, a positive jump may have an impact 
if the magnitude is enough sufficiently large to increase the delta. 


In the case of liquid markets with low transaction costs, a delta neutral hedging may be 
efficiently implemented in a high frequency basis (daily or intra-day rebalancing). This is 
not the case of less liquid markets. Moreover, we observe an asymmetry between call and 
put options. The delta of call options is positive, implying that the replicating portfolio is 
long on the asset. For put option, the delta is negative and the replicating portfolio is short 
on the asset. We know that it is easier to implement a long position than a short position. 
Sometimes, it is even impossible to be short. For instance, this explains that there exist call 
options on mutual funds, but not put options on mutual funds. We understand that model 
risk of derivatives does not only concern the right values of model parameters. In fact, 
model risk also concerns the hedging management of the option including the feasibility 
and efficiency of the delta hedging strategy. A famous example is the difference between a 
put option on S&P 500 index and Eurostoxx 50 index. We know that the returns of the 
Eurostoxx 50 index present more discontinuous patterns than those of the S&P 500 index. 
The reason is that European markets react more strongly to American markets than the 
opposite. This explains that the difference between the closing price and the opening price 
is more higher in European markets than in American markets. Therefore, a put option on 
the Eurostoxx 50 index contains an additional premium compared to a put option on the 
S&P 500 index in order to take into account these stylized facts. 


Greek sensitivities We have seen that the delta of the call option is defined by: 


_ act, S(t) 
AW) = aap 
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FIGURE 9.5: Impact of a jump on the hedging ratio z (t) 
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FIGURE 9.6: Impact of a jump on the hedging ratio z (t) 


504 Handbook of Financial Risk Management 
We have then: 
C(t+dt,S(t+h))-—C(t,S(t)) = A(t) - (9 (t+ dt) — S (t)) 


This Taylor expansion can be extended to other orders and other parameters. For instance, 
the delta-gamma-theta approximation is: 


C(t+at,S(t+h)) -C(t,S()) = A()-(S(t+at)—S(H) + 
ITO (S E+) — $(f))? + 
© (t)-((¢+ at) —2) 


where the gamma is the second-order derivative of the call option price with respect to the 
underlying asset price: 
_ C(t, S(t)) AACE) 

asa? ð S (t) 


and the theta is the derivative of the call option price with respect to the time: 


_ aCt,S()) desa) 


Ot OT 


A positive theta coefficient implies that the option value increases if nothing changes, in 
particular the price of the underlying asset. By construction, the theta is related to the 
time value of the option. This is why the theta is generally low for options with a short 
maturity. In fact, understanding theta effects is complicated, because the theta coefficient 
is not monotonic in any of the parameters (underlying price, volatility and maturity). We 
recall that the option price satisfies the PDE: 


T (t) 


0 (t) 


570 + bSA+0-—rC =0 
We deduce that the theta of the option can be calculated as follows: 
0 =rC — a —bSA 
This equation shows that the different coefficients are highly related. 


Example 79 We consider a call option, whose strike K is equal to 100. The risk-free rate 
and the cost-of-carry parameter are equal to 5%. For the volatility coefficient, we consider 
two cases: (a) o = 20% and (b) o = 50%. 


In Figure 9.7, we have reported the option delta for different values of the asset price 
So and different values of the maturity T. We have A(t) € [0,1]. The delta is close to 
zero when the asset price is far below the option strike, whereas it is close to one when 
the option is highly in-the-money. We also notice that the coefficient A is an increasing 
function of the price of the underlying asset. The relationship between the option delta and 
the maturity parameter is not monotonous and depends whether the option is in-the-money 
or out-of-the-money. In a similar way, the impact of the volatility is not obvious, and may 
be different if the option maturity is long or short. 


Figure 9.8 represents the option gamma®. It is close to zero when the current price of 
the underlying asset is far from the option strike. In this case, the option trader does not 


8See Exercise 2.4.7 on page 121 for the analytical expression of the different sensitivity coefficients of the 
call option. 
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FIGURE 9.7: Delta coefficient of the call option 
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FIGURE 9.8: Gamma coefficient of the call option 
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need to revise its delta exposure frequently. The gamma coefficient is maximum in the at- 
the-money region or when the delta is close to 50%. In this situation, the delta can highly 
vary and the trader must rebalance the replicating portfolio more frequently in order to 
reduce the residual risk. 


Let us assume a delta neutral hedging portfolio. The trader can face four configurations 
of residual risk given by the following table: 


+4 
vV 


The configuration (T < 0, © < 0) is not realistic, because the trader will not accept to build 
a portfolio, whose P&L is almost surely negative. The configuration (T > 0, © > 0) is also 
not realistic, because it would mean that the P&L is always positive whatever the market. 
Therefore, two main configurations are interesting: 


(a) a negative gamma exposure with a positive theta; 
(b) a positive gamma exposure with a negative theta. 


We have represented these two cases in Figure 9.9, and we notice that they lead to different 
P&L profiles?: 


(a) If the gamma is negative, the best situation is obtained when the asset price does not 
move. Any changes in the asset price reduce the P&L, which can be negative if the 
gamma effect is more important than the theta effect. We also notice that the gain is 
bounded and the loss is unbounded in this configuration. 


(b) If the theta is negative, the loss is bounded and maximum when the asset price does 
not move. Any changes in the asset price increase the P&L because the gamma is 
positive. In this configuration, the gain is unbounded. 


In order to understand these P&L profiles, we have represented the gamma and theta effects 
in Figure 9.10 for the case (b). The portfolio is long on a call option and short on the delta 
neutral hedging strategy. The parameters are the following: So = 98, K = 100, o = 10%, 
b = 5%, r = 5% and T = 0.25. The value of the option is equal to 1.601 and we have 
A (to) = 44.87%. In the first panel in Figure 9.10, we have reported the option price (solid 
curve) and the delta hedging strategy (dashed tine) at the current date to when the asset 
price moves. The area between the two curves represents the gamma effect. We notice that 
it is positive. For instance, we have T (tọ) = 11.55%. We do not rebalance the portfolio 
until time t = to + dt where dt = 0.15. The dashed curve indicates the value of the option 
price’? at the date t. The area between C (t, S (t)) (dashed curve) and C (to, S (t)) (solid 
curve) represents the theta effect. We notice that it is negative'!. In the second panel, we 
have reported the resulting P&L. This is the difference between the first area (positive 
gamma effect) and the second area (negative theta effect). We retrieve the results given in 
the second panel in Figure 9.9. 


°We have also indicated the case (a’) where the gamma is equal to zero. In this case, we obtain a gamma 
neutral hedging portfolio and it is not necessary to adjust frequently the hedging portfolio. 

10We use the same parameters, except that the maturity is now equal to 0.10. 

1lWe have © (to, So) = —7.09. 
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(a) Negative gamma & positive theta (b) Positive gamma & negative theta 


Asset price 


(a’) Zero gamma & positive theta 
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FIGURE 9.9: P&L of the delta neutral hedging portfolio 
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FIGURE 9.10: Illustration of the configuration (T > 0,0 < 0) 
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9.1.1.4 The implied volatility 


Definition In the Black-Scholes formula, all the parameters are objective except the 
volatility ø. To calibrate this parameter, we can use a historical estimate ô. However, the 
option prices computed with the historical volatility ô do not fit the option prices observed 
in the market. In practice, we use the Black-Scholes formula to deduce the implied volatility 
that gives the market prices: 


fps (So, K, implied: 1, b, r) =V (T, K) 


where fgs is the Black-scholes formula and V (T, K) is the market price of the option, whose 
maturity date is T and whose strike is K. By convention, the implied volatility is denoted 
by =, and is a function of the parameters!? T and K: 


implied = xu (T, K) 


Example 80 We consider a call option, whose maturity is one year. The current price of 
the underlying asset is normalized and is equal to 100. Moreover, the risk-free rate and the 
cost-of-carry parameter are equal to 5%. Below, we report the market price of European call 
options of three assets for several strikes: 


K 90 95 98 100 101 102 105 110 
Cı (T, K) | 16.70 13.35 11.55 10.45 9.93 9.42 8.02 6.04 


Cə (T, K) | 18.50 14.50 12.00 10.45 9.60 9.00 7.50 5.70 
C3 (T, K) | 18.00 14.00 11.80 10.45 9.90 9.50 8.40 7.40 
TABLE 9.4: Implied volatility 5 (T, K) 

90 95 98 100 101 102 105 110 
“1 (7, K) | 20.00 20.01 19.99 20.0 20.01 19.99 20.00 20.00 


( 
£ə(T,K) | 26.18 23.41 21.24 20.0 19.14 18.90 18.69 19.14 
£ (T, K) | 24.53 21.95 20.68 20.0 19.93 20.20 20.95 23.43 


For each asset and each strike, we calculate X (T, K) and report the results in Table 9.4 
and Figure 9.11. For the first set Cı of options, the implied volatility is constant. In the 
case of the options C2, the implied volatility is decreasing with respect to the strike K. In 
the third case, the implied volatility is decreasing for in-the-money options and increasing 
for out-of-the-money options. 


Remark 99 When the curve of implied volatility is decreasing and increasing, the curve is 
called a volatility smile. When the curve of implied volatility is just decreasing, it is called a 
volatility skew. If we consider the maturity dimension, the term structure of implied volatility 
is known as the volatility surface. 


Relationship between the implied volatility and the risk-neutral density Bree- 
den and Litzenberger (1978) showed that volatility smile and risk-neutral density are related. 
Let C: (T, K,) be the market price of the European call option at time t, whose maturity is 


125 (T, K) also depends on the other parameters So, b and r, but they are fixed values at the current 
date to. 
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FIGURE 9.11: Volatility smile 


T and strike is K. We have: 


Cı (T, K) 


j k Jer (s(r)— xy* 


s 


ae) I (S—K)t q (T, S) dS 


—Co 


= eT- | (S— K) q(T, S) ds 
K 


where qe (T, S) is the risk-neutral probability density function of S (T) at time t. By defini- 
tion, the risk-neutral cumulative distribution function Q; (T, S) is equal tot: 

S 
Q (T,9)= | a (T,3) dx 


—co 


We deduce that: 


OC. (T,K) en 
aK = —e 7 a (T, S) dS 
= =g 79 (1—Q,(T,K)) 
ii 3? Cı (T, K) 
tU, aT 
Sa a TK) 


13We use the notations Q; (T, S) and qt (T, S) instead of Q (S) and q (S) because they will be convenient 
when considering the local volatility model. 
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It follows that the risk-neutral cumulative distribution function is related to the first deriva- 
tive of the call option price: 
Q(T,K) = Pr{S(T)< K | Fi} 
= 1+e"?-*). nC, (T, K) 


We note }, (T, K) the volatility surface and Cf (T,K,x) the Black-Scholes formula. It 
follows that: 


Qı (T,K) = 146-9. OnC* (T, K, X(T, K)) + 
e" T=) 7 Os Ci (T, K, Le (T, K)) OKy (T, K) 
where: 
OxC* (T, K, X£) = —e "(T= . 8 (də) 
and: 


ƏC% (T, K, £) = S (t) - e®-T-d . YT t. ġ (a, LT = t) 
If we are interested in the risk-neutral probability density function, we obtain: 


q (T, K) aKQ: (T, K) 


er) . 8?-C, (T, K) 


where: 
3C (T,K) = 8C} (T, K, £) + 
2. ðk Cf (T, K, Xt) OxXe (T, K) + 
zC% (T, K, £) i ORM (T, K) + 
OZC} (T, K, £e) - (Ox De (T, K))” 


and: 


ORC} (TKD) = en aL 


KXVT -t 
_ryr—t) S (t) dig (di) 
2 *(P KD) = ebr) (T-t) PAY LP a) 
On SCF ( yA, ) € SK 
2C? (T, K, X) = e@-nr-y S (Ë) dida vT -te (d1) 


>>; 
Example 81 We assume that S(t) = 100, T — t = 10, b = r = 5% and: 


£; (T, K) = 0.25 + In (1 +1078 (K — 90)? + 10-8 (K — 180)") 


In Figure 9.12, we have represented the volatility surface and the associated risk-neutral 
probability density function. In fact, they both contain the same information, but profes- 
sionals are more familiar with implied volatilities than risk-neutral probabilities. We have 
also reported the Black-Scholes risk-neutral distribution by considering the at-the-money 
implied volatility. We notice that the Black-Scholes model underestimates the probability 
of extreme events in this example. 
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FIGURE 9.12: Risk-neutral probability density function 
Robustness of the Black-Scholes formula El Karoui et al. (1998) assume that the 
underlying price process is given by: 
dS (t) = w(t) S (t) dt+ o (t) S (t) dW (t) (9.9) 


whereas the trader hedges the call option with the implied volatility £ (T, K), meaning that 
the risk-neutral process is: 


dS (t) = rS (t) dt + È (T, K) S (t) dW® (t) (9.10) 


We reiterate that the dynamics of the replicating portfolio is: 


dV (t) = ¢(t) ds (t) + 4 (t) dB (t) 
á gogar © FA 20) pidi 
= $(t) dS (t) +r (V (t) — ¢ (t) S (t)) dt 


rV (t) dt + ¢ (t) (dS (t) — rS (t) dt) 
Since C (t) = C (t, S (t)), we also have: 


dC (t) = (ac (t, S (t)) + Lo? (t) S? (t) 02C (59) ) dt + 
sC (t, S (t)) dS (t) 
Using the PDE (9.2), we notice that: 
AC (t, S(t) = r€(t,S(t)) —rS (t) ds€ (t, S (t)) — 


a (T, K) S? (t) 2C (t, S (t)) 
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We deduce that: 
dC(t) = rC(t,S(t)) dt + 
AC (t, S (t)) (AS (t) — rS (t) dt) + 
; (0? (£) — =? (T, K)) S? (t) 32C (t, S (£) dt 


We consider the hedging error defined by: 


Since ¢ (t) = OsC (t, S (t)), we have: 
de(t) = dV(t)—de(t) 
= WO sled) G9O— Odo 6@) a= 
AsC (t, 5 (t)) (45 (t) — S(t) dt) + 
: (©? (T, K) — 0? (t)) S? (t) ABC (t, S (t) dt 


= redia ; (£? (T, K) — 0 (t)) S? (t) 82C (t, S (£) dt 


We deduce that!4: 
V(T)-C(T)= aa e” T-ÐT (t) (£? (T, K) — 0? (t)) S? (t) dt (9.11) 


This equation is know as the robustness formula of Black-Scholes hedging (El Karoui et al., 
1998). Formula (9.11) is one of the most important results of this chapter. Indeed, since the 
gamma coefficient of a call option is always positive, we can obtain an almost sure P&L 
if the implied volatility is larger than the realized volatility and if there is no jump. More 
generally, the previous result is valid for all types of European options: 


T 
V(T)-f(S(T)) == f e” T-ÐT (t) (D? (T, K) — 0? (t)) 9° (t) dt (9.12) 


where f (S (T)) is the payoff of the option. We obtain the following results: 
e ifT (t) > 0, a positive P&L is achieved by overestimating the realized volatility: 
X (T, K) 2 o (t) = V (T) > f (S (T)) 
e ifT (t) < 0, a positive P&L is achieved by underestimating the realized volatility: 
X (T, K) < o (t) = V (T) > f (S (T)) 


e the variance of the hedging error is an increasing function of the absolute value of the 
gamma coefficient: 


E | Z= var (V (T) -= f (S (T) 7 


In terms of model risk, the robustness formula highlights the role of the implied volatility, 
the realized volatility and the gamma coefficient. An important issue concerns the case when 
the gamma can be positive and negative and changes sign during the life of the option. We 
cannot then control the P&L by using a lower or an upper bound for the implied volatilityt. 


14Because we have e (0) = V (0) — C (0) = 0. 
15 This issue is solved on page 530. 
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Example 82 We consider the replication of 100 ATM call options. The current price of the 
asset is 100 and the maturity of the option is 6 months (or 130 trading days). We consider 
the following parameters: b = r = 5%. We rebalance the delta hedging portfolio every trading 
day. Moreover, we assume that the option is priced and hedged with a 20% implied volatility. 


Figure 9.13 represents the density function of the hedging ratio m. In the case where the 
realized volatility ø (t) is equal to the implied volatility, we retrieve the previous results: 
m is centered around zero. However, if the realized volatility ø (t) is below (or above) the 
implied volatility, m is shifted to the right (or the left). If o (t) < X, then there is a higher 
probability that the trader makes a profit. In our example, we obtain: 


Pr{x>0|¥ = 20%, 0 = 15%} = 99.04% 


and: 
Pr {r > 0 | E = 20%, o = 25%} = 0.09% 


n (in 7) 


FIGURE 9.13: Hedging error when the implied volatility is 20% 


9.1.2 Interest rate risk modeling 


Even if the Vasicek model is not used today by practitioners, it is interesting to study it 
in order to understand the calibration challenge when considering fixed income derivatives. 
Indeed, in the Black-Scholes model, the calibration consists in estimating a few number 
of parameters and the main issue concerns the implied volatility. We will see that pricing 
exotic fixed income derivatives is a more difficult task, because the choice of the risk factors 
is not obvious and may depend on the tractability of the pricing model!®. 


16We invite the reader to refer to the book of Brigo and Mercurio (2006) for a more comprehensive 
presentation on the pricing of fixed income derivatives. 
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9.1.2.1 Pricing zero-coupon bonds with the Vasicek model 


Vasicek (1977) assumes that the state variable is the instantaneous interest rate and 
follows an Ornstein-Uhlenbeck process: 


{ dr (t) =a(b—r(t)) dt+odW (t) 
r (to) = ro 


We recall that a zero-coupon bond is a bond that pays $1 at the maturity date T. Therefore, 
we have V (T,r) = 1 if we note V (t,r) the price of the zero-coupon bond at time t when 
the interest rate r (t) is equal to r. The corresponding partial differential equation becomes 
then: 


2 
1 29 V (t,r) 


OV (t,r) i OV (t,r) 
2 Or? 


Or Ot 


By applying the Feynman-Kac representation theorem, we deduce that: 


r(t)V (t,r) =0 


+ (a(b—r(t)) — A(t)o) 


ly 
V (0,70) = zQ [e J, r(t) dt 


F| (9.13) 


where the risk-neutral dynamic of r (t) is: 


l E ES E + cdW2 (2) 
r (to) = ro 


Vasicek (1977) assumes that the risk price of the Wiener process is constant: À (t) = X. It 
follows that the risk-neutral dynamic of r (t) is an Ornstein-Uhlenbeck process: 


{ dr (t) = a (b' — r(t)) dt + odW® (t) 
r (to) = ro 


where: 
We note Z = for dt. In Exercise 9.4.2 on page 593, we show that Z is a Gaussian 


random variable where: 
: 1—e 4 
i [Z] = bT + (ro — b) — 


var(Z)= (r ==) = (a oy") 


V(0,ro) = E®[e~7| Fo] 


and: 


We deduce that: 


= exp (° [Z] + Í var? (2) 


2 2.92 
= exp ( rob (v Z) (T — B) ZE) 


where: 
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If we use the standard notation B(t,T), we have B(t,T) = V (T —t,r (t)). We recall 
that the zero-coupon rate R(t, T) is defined by: 


B (t, T) — e~ (T-ÐR(T) 


We deduce that: 


R&T) = TE In B (tT) 
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Since we have: 


: , oF 
am A ee 


the zero-coupon rate has the following expression: 


1— a) a he e7a(T—t))? Gia 


R(t, T) = Ræ + (r: Rx) ( a(T =t) 4a? (T — t) 


The yield curve can take three different forms (Figure 9.14). Vasicek (1977) shows that the 
2 2 


Sana ` : Oo x 3 oO z ae 
curve is increasing if r; < Ro — tee and decreasing if r; > Ro + a2’ Otherwise, it is a 


bell curve. 


R(t,T) (in 27) 


FIGURE 9.14: Vasicek model (a = 2.5, b = 6% and o = 5%) 
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Let F (t,Tı, T2) be the forward rate at time t for the period [T,7>]. It verifies the 
following relationship: 


B (t, Ta) = e7 TTE ETT?) B (t, T) 
We deduce that the expression of F (t, Ti, T2) is: 


1 B (t, To) 
(T-T) BEN) 


F (t, Ti, Ta) = 


It follows that the instantaneous forward rate is given by this equation!”: 


_OmBt,T) 


f@7)=F(C,T,T) = aT 


Using Equation (9.14), we deduce another expression of the price of the zero-coupon bond: 


_ e7u(T-t) o2 = e-u(T—t) 2 
B (t, 72) = exp ( (T t) Ro (ri Roo) (C ) (1 ) 


a 4a3 


Therefore, the instantaneous forward rate in the Vasicek model is: 


g2 (1 _ gor) e~uT-t) 


= —a(T— f 
f(t, T) = Roo + (rt — Roo) e a 2a? 


Remark 100 Forward rates are interest rates that are locked in forward rate agreements 
(FRA). It involves two dates: T, is the start of the period the rate will be fixed for, and Tz 
is the maturity date of the FRA. Tə —T\ is the maturity of the locked interest rate. It is also 
called the tenor of the interest rate that is being fixed. Therefore, F (t, Tı, T2) is the forward 
value of the spot rate R(t, Tə —T)). 


9.1.2.2 The calibration issue of the yield curve 


Hull and White (1990) propose to extend the Vasicek model by considering that the 
three parameters a, b and o are deterministic functions of time. Under the risk-neutral 
probability measure, the dynamics of the interest rate is then: 


dr (t) = a(t) (b(t) — r (t)) dt + o (t) dW® (t) 


The underlying idea is to fit the term structure of interest rates and other quantities, such 
as the term structure of spot volatilities. However, the generalized Vasicek model produces 
unrealistic volatility term structures. Therefore, Hull and White (1994) focused on this 
extension: 


dr (t) = a(b(t)—r(t)) dt+odW® (t) 
= (6(t) —ar(t)) dt+odW® (t) 


17We also notice that B (t, T) can be expressed in terms of instantaneous forward rates: 


T 
BT) =e J f(t,u) du 
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where @(t) = a- b(t). If we want to fit exactly the yield curve, we can consider arbitrary 
values for the parameters a and g, because the calibration of the yield curve is done by the 
time-varying mean-reverting parameter: 


we 
O(t)= OF.) + af (0,t) + Ja (1 — e7?) 

or: ; a 
b(t) = f (0,t) + zT (0,t) + a2 =e) (9.15) 


We notice that b(t) depends on the instantaneous forward rate, which is the first derivative 
of the price of the zero-coupon bond. 


Example 83 We assume that the zero-coupon rates are given by the Nelson-Siegel model 
with 0; = 5.5%, 02 = 0.5%, 03 = —4.5% and 64 = 1.8. 


We reiterate that the spot rate R (t, T) in the Nelson-Siegel model is equal to: 
1 SN e` (T—t)/94 1 = e` (T—t)/94 
R(t, T)=0 +0 +0 =(T—=t)/04 
(T= 0+ (rym) e e a 
We deduce that the instantaneous forward rate corresponds to the following expression: 


Sie = ou) 


= ĝi +e T/a 4 M ernie 
4 


For the slope, we have: 


Ə fT) _ ((03— 82) _ 0s (T =t) \ p -T-)/04 
oT A E 


Fitting exactly the Nelson-Siegel yield curve is then equivalent to define the time-varying 
mean-reverting parameter b(t) of the extended Vasicek model as follows: 


Ost o? 
—t/04 , 73% —t/O04 , —2at) | 
b(t) = 0,4 O,e~*/% 4 T Mia ere) 
1 ((03—62) Ost 1/64 
a 04 OF 
03t 1 03 —t/0 
= } } 1 } 4 
at (2+) an) ta) 
2 
T —2a 
= (1 — e? t) 


In Figure 9.15, we have represented the yield curve obtained with the Nelson-Siegel model 
in the top/left panel. We have also reported the curve of instantaneous forward rates in 
the top/right panel. The bottom/left panel corresponds to the time-varying mean-reverting 
parameter b(t). We have used three set of parameters (a,c). Finally, we have recalculated 
the yield curve of the extended Vasicek model in the bottom/right panel. We retrieve the 
original yield curve. We can compare this solution with those obtained by minimizing the 
sum of the squared residuals: 


(ĉo, a, b, ô) = arg min X` (RS (t, Ti) — R (t, Ti; ro, a,b, o))? 
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where RNS (t, T;) is the Nelson-Siegel spot rate, R (t, Tj; ro, a, b, 7) is the theoretical spot rate 
of the Vasicek model and i denotes the it observation. By considering all the maturities 
between zero and twenty years with a step of one month, we obtain fo = 6%, à = 16.88, 
b = 7.47% and 6 = 3.91%. Unfortunately, the fitted Vasicek model (curve #2) does not 
reproduce the original yield curve contrary to the fitted extended Vasicek model (curve #1). 


Nelson—Siegel yield curve (in 7) Instantaneous forward rate (in 7) 

6.0 6.0 
5.5 5.5 
5.0 5.0 
4.5 4.5 
4.0 4.0 

5 10 15 20 0 5 10 15 20 

Maturity (in years) Maturity (in years) 
Mean-reversion parameter b(t) (in 7) Calibrated Vasicek yield curve (in 7) 


6.0 


— #1 (extended Vasicek) 
=== #2 (Vasicek) 


Maturity (in years) Maturity (in years) 


FIGURE 9.15: Calibration of the Vasicek model 


The yield curve is not the only market information to calibrate. More generally, the 
calibration set of an interest rate model also includes caplets, floorlets and swaptions (Brigo 
and Mercurio, 2006). This explains that pricing interest rate exotic options is more difficult 
than pricing equity exotic options, and one-factor models based on the short rate are not 
sufficient, because it is not possible to calibrate caps, floors and swaptions. 


9.1.2.3 Caps, floors and swaptions 


We consider a number of future dates To, Ti,..., Tn, and we assume that the period 
between two dates T; and T;_, is approximately constant (e.g. 3M or 6M). A caplet is 
the analog of a call option, whose underlying asset is a forward rate. It is defined by 
the payoff (T; — T;-1) (F (Ti-1, Ti-1, Ti) — K)”, where K is the strike of the caplet and 
F (Ti—1,Ti—1, Ti) is the forward rate at the future date T;_1. 6;-1 = T; — T;-1 is then the 
tenor of the caplet, T;_1 is the resetting date (or the fixing date) of the forward rate whereas 
T; is the maturity date of the caplet. A cap is a portfolio of successive caplets'®: 


Cap (t) = 5 Caplet (t, Ti—1, T;) 


i=l 


18We have t < Tp. 
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Similarly, a floor is a portfolio of successive floorlets: 
Floor (t) = 5 Floorlet (t, Ti—1, T;) 
i=1 
where the payoff of the floorlet is (T; — T;-1) (K — F (T-T). 
A par swap rate is the fixed rate of an interest rate swap1° 
B (t, To) — B (t, Th) 
Xia (Ti — Ti-1) - B (t, Ti) 


20. 


Sw (t) = 


Then, we define the payoff of a payer swaption as 


n 


(Sw (To) — K)* X (T; — Ti-1) B (To, Ti) 


i=1 
where Sw (To) is the forward swap rate. 


Remark 101 Generally, caps, floors and swaptions are written on the Libor rate, which is 
defined as a simple forward rate: 


1 B (t, T;—1) 
L(t,T;_1,T;) = 1 
Ti-Ti) aaa (SER ) 


In order to price these interest rate products, we can use the risk-neutral probability 
measure Q, and we have?! 
F 


)| F: 


Ti 
Caplet (t, T;-1, T;) = E® E h CWO 1 (L (T-n Ti- Ty K)" 


and: 


Swaption (t) = E® |e i oes (Sw (To) — es ôi—1 B (To, T; 


We face here a problem because the discount factor is ee and is not independent 
from the forward rate L(T;-1,T;-1,T7;) or the forward swap rate Sw (To). Therefore, the 
risk-neutral transform does not help to price interest rate derivatives. 


9.1.2.4 Change of numéraire and equivalent martingale measure 


We recall that the price of the contingent claim, whose payoff is V (T) = f (S (T)) at 
time T, is given by: 


V (0) = E? [e Lovina 


where Q is the risk-neutral probability measure. We can rewrite this equation as follows: 


V() _ yo | | | (9.16) 


19T) = t corresponds to a spot swap, whereas Tp > t corresponds to a forward start swap. 
20The payoff of a receiver swaption is: 


n 


(K — Sw (To)) t 5 (T; — T;—1) B (To, T;) 


i=l 


21 We recall that 6;_1 is equal to T; — T;_1 
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uty =en (fro) as) 


Under the probability measure Q, we know that V (t) = V (t) /M (t) is an F;-martingale. 
The money market account M (t) is then the numéraire when the martingale measure is the 
risk-neutral probability measure?*, but other numéraires can be used in order to simplify 
pricing problems: 


where??: 


“The use of the risk-neutral probability measure has proved to be very powerful 
for computing the prices of contingent claims [...] We show here that many 
other probability measures can be defined in the same way to solve different 
asset-pricing problems, in particular option pricing. Moreover, these probability 
measure changes are in fact associated with numéraire changes” (Geman et al., 
1995, page 443). 


Let us consider another numéraire N (t) > 0 and the associated probability measure given 
by the Radon-Nikodym derivative: 


dQ* z N (T) /N (0) 
dQ M (T) /M (0) 
= e y r(s)ds . NAD) 
N (0) 
We have: 
sor | V (7) = EÈ ad 
aid ba qQ 7| 
M (0) V (T) 
= NO) e [aren] 
_ MO, 
= —-~.V(0) 
(0) 
We deduce that: V (0) _ O* | V (T) |7] 9.17 
vO pe [P| (9.17) 


We notice that Equation (9.17) is similar to Equation (9.16), except that we have changed 
the numéraire (M (t) > N (t)) and the probability measure (Q > Q*). More generally, we 
have: 


V(t) =N (t) -EÙ Rl 


t 
Thanks to Girsanov theorem, we also notice that e ie) ds y (t) is an F;-martingale. 


Example 84 The forward numéraire is the zero-coupon bond price of maturity T: 
N(t)=B(t,T) 


In this case, the probability measure is called the forward probability and is denoted by 
Q* (T). This martingale measure has been originally used by Jamshidian (1989) for pricing 
bond options with the Vasicek model. Another important result is that forward rates are 
martingales under the forward probability measure (Brigo and Mercurio, 2006). 


22We note that M (0) = 1. 
23 M (t) is also called the spot numéraire. 


Model Risk of Exotic Derivatives 521 


By noticing that N (T) = B(T,T) = 1, Equation (9.17) becomes: 


V (t) = B (t, T)E® P [V (T) Fi] 


For instance, in the case of a caplet, we obtain: 


Caplet (t, Ti—1, Ti) 


a| MG) 
6;-1E® re (L (Ti-1, Ti—1, Ti) — K)* 


s 


* R N t 
= 0; 1 zQ (Ti) [som (L (T;-1, T;-1, Ti) = K)* F] 
zQ 


* (Tı) [z (Ti hash) = K)*| Fi 


dB (t, T;) 


where L (t, Ti—1, Ti) is an F;-martingale under the forward probability measure Q* (T;). If 
we use the standard Black model, we obtain: 


Caplet (t, Ti-1, T;) = 6;-1B (t, T;) (L (t, Gi T;) ® (dı) — KP (d2)) (9.18) 
where??: i Enan 1 
tii T; 
dı = ln +a Ti —t 
! oi-1/ Ti-1 —t K a ! 
and: 


dz = dy — mi-i Ti-1 —t 


If we consider other models, the general formula of the caplet price is?”: 


Gee (14 sa) 


Example 85 The annuity numéraire is equal to: 


Caplet (t, T;—1, T;) = B (t, T;) zQ* (Ti) 


: 


n 


N (t) = X (T: - Ti—1) B (t, T;) 


i=1 


While the forward swap rate is a martingale under the annuity probability measure Q*, the 
annuity numéraire is used to price a swaption (Brigo and Mercurio, 2006). 


24; 1 is the volatility of the Libor rate L (t, Ti—1, T;)- 


25We have: 
, + 
õi—1 (L (t, Ti-1;T;)—- K)t = (E (4 1K) 
(B (t, Ti—1) = (1 + 6;-1K) B (t, T) 
B (t, T) 
and: 


+ 
1 
ôi—1 (L (Ti-1,Ti—-1, T;) — K)* = 14+6;,1K 
1 (L (Ti—1 1, Ti) ) Genes ( 1 ) 
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We deduce the following pricing formula for the swaption: 


Swaption(t) = E® TE Su Go) 20) G 2 F, 
_ gor | NE 
= E N (py (8 (Po) - Ky Sh 1B (To, T; 
= NEV | (Sw (To) - K) z] (9.19) 
E + 
= N(t)E® (FRE - 1) Fi 


Using Equation (9.19), we can also find a Black formula for the swaption, in exactly the same 
way as caps and floors. However, we face here an issue. Indeed, it is equivalent to assume 
that all the forward rates are log-normal under the different forward probability measures 
Q* (T;) and the swap rates are also log-normal under the annuity probability measures Q*. 
The problem is that these different forward and swap rates are related, and their dynamics 
are not independent. 


9.1.2.5 The HJM model 


Until the beginning of the nineties, the state variable of fixed income models is the in- 
stantaneous interest rate r (t). For instance, it is the case of the models of Vasicek (1977) and 
Cox et al. (1985). However, we have seen that we face some calibration issues when consid- 
ering such framework. Heath et al. (1992) propose then that the state variables are forward 
rates, and not spot rates. Under the risk-neutral probability measure Q, the dynamics of 
the instantaneous forward rate for the maturity T is given by: 


f(t, T) = f (0,T) + [esT ast f o(sT) dW? (s) 
where f (0, T) is the current forward rate. Therefore, the stochastic differential equation is: 
df (t,T) = a (t, T) dt + o (t, T) dW® (t) (9.20) 
Bond pricing We recall that: 
B(t, T) =e Je Feu) au 


If we note X (t) = — SET ) du, we have: 


dX (t) 


T 
f (t,t) | df (t,u) du 


= f(t,t) dt— (f oew au) dt — (f oew au) dw® (t) 


(f (t,t) + a(t,T)) dt +b (t, T) dW® (t) 


II 


where: 


T 
T =— [ a(t,u) du 
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and: 


We deduce that: 


dB(t,T) = XM AX () + FeO (AX (t) ,aX (t) 


1 
(s (t,t) +a (t,T) + 5Y (.7)) B(t,T) dt+ 
b(t, T) B (t, T) dW® (t) 
Since f (t, t) is equal to the spot rate r (t), the HJM model implies the following restriction?®: 


a(t,T) =a(t,T) f otw du (9.21) 


Equation (9.21) is known as the ‘drift restriction’ and is necessary to ensure no-arbitrage 
opportunities. In this case, we verify that the discounted zero-coupon bond is a martingale 
under the risk-neutral probability measure Q: 


dB (t, T) = r (t) B (t, T) dt + b(t, T) B (t, T) dW® (t) 


Dynamics of spot and forward rates The drift restriction implies that the dynamics 
of the instantaneous forward rate f (t, T) is given by: 


T 
gure- ¢ T) f ates) au) dt + o (t, T) aW (8) 


Therefore, we have: 


t T t 
fit) =£(0,7)+ f benj o (s,u) au) as+ f o (s, T) dW2(s) 


If we are interested in the instantaneous spot rate r (t), we obtain: 
r(t) = f(tt) 
t t t 
z rot f GEJ o (s,u) au) as+ f o(s,t) dW2(s) 
0 s 0 


Forward probability measure We now consider the dynamics of the forward rate 
f(t, Tı) under the forward probability measure Q* (T>) with Tə > Tı. We reiterate that 
the new numéraire N (t) is given by: 


T3 
N (t) = B(t, Tə) = e J, f(t,u) du 


26 Indeed, we must have: 
1 
a(t,T) + af (t,T) =0 


or: 
ara (t, T) = —b (t, T) - Orb (t, T) 
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In Exercise 9.4.5 on page 596, we show that: 


Tə 
df (t, Ti) = — ( em f o (t,u) au) dt + o (t, T1) dW2 T (4) 
Tı 
It follows that f (t, Tı) is a martingale under the forward probability measure Q* (T4): 
df (t, Ta) = o (t, Ta) awe (6) 


We can also show that B (t, T2) /B (t, Tı) is a martingale under Q* (Tı) and we have: 


Tı . Ty 
B(T%,T2) = ea exp (/ g(u) dwe' T) (u) — al g? (u) au) 


where: 


g (t) = b(t, T2) — b(t, Tı) 


Some examples If we assume that ø (t, T) is constant and equal to ø, we obtain: 


f,T) = f (0,T) + 0° (r S) + oW® (t) 
and: r 
r(t) = FO.) toT +owe (t) 


This case corresponds to the Gaussian model of Ho and Lee (1986). 
Brigo and Mercurio (2006) consider the case of separable volatility: 


a(t, T) =€ (t) p(T) 
We have: 
dr(t) = (ason ofe (s) ds 4 
E(t) Y (t) dW® (t) 


For example, if we set o (t, T) = ce~*7—"), we have £ (t) = ce“, y (T) = e~* and”: 


(r (t) — f (0, t)) 
p(t) 


yy w) dt + 


eo 2at 


dr (t) = (ar (0.0 +o? =) +a(f(0,t) -r (0) dt + o dW? (t) 


We retrieve the generalized Vasicek model proposed by Hull and White (1994): 
dr (t) = a (b (t) — r (t)) dt + o AW°Ì (t) 


where b(t) is given by Equation (9.15) on page 517. 


t t 
ap? of & (s)ds = aie e745 ds 
0 0 


27 We have: 


and: 
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Ritchken and Sankarasubramanian (1995) have identified necessary and sufficient con- 
ditions on the functions € and w in order to obtain a Markovian short-rate process. They 
showed that they must satisfy the following conditions: 


E(t) =o (teh O 
and: z 
Y (T) eo S K(s) ds 
where o (t) and « (t) are two F;-adapted processes. In this case, we obtain: 
ae 
cenere tk (OO 


For instance, the generalized Vasicek model is a special case of this framework where the 
two functions ø (t) and « (t) are constant?®. 


Extension to multi-factor models We can show that the previous results can be ex- 
tended when we assume that the instantaneous forward rate is given by the following SDE: 


df (t,T) = a(t,T) dt +o (t, T)" AW? (t) 
where W® (t) = (w? (t),...,.we (t)) is a n-dimensional Brownian motion and p is the 


correlation matrix of WÊ (t). For instance, the drift restriction (9.21) becomes: 


T 
a(t, T) =o T)" p f o (t,u) du 


t 


In the two-dimensional case, we obtain: 


df(t, T) = @ (tT) [ gı (t, u) au) dt + (a (t, T) [ oa (t, u) au) dt 


r @ (T) I ne ey f T au) dt 
o1 (t, T) AWÈ (t) + o2 (t, T) dW (t) 


For example, Heath et al. (1992) extend the Vasicek model by assuming that c1 (t, T) = 01, 
o2 (t, T) = oge~@ 7-9 and py.2 = 0. In this case, we obtain: 


ae 05 —aat 1 —2aet 
PE = FO) +a + e me) =5 (le) ) + 
2 


t 
o1W? (t) +02 J e725) AW? (s) 
0 


9.1.2.6 Market models 


One of the disadvantages of short-rate and HJM models is that they focus on instanta- 
neous spot or forward interest rates. However, these quantities are unobservable. At the end 
of the nineties, academics have developed two families of models in order to bypass these 
disadvantages: the Libor market model (LMM) and the swap market model (SMM). 


28 We have o (t) = c and « (t) =a. 
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The Libor market model The Libor market model has been introduced by Brace et al. 
(1997) and is also known as the BGM model in reference to the names of Brace, Gatarek 
and Musiela. We recall that the Libor rate is defined as a simple forward rate: 


Tizi — Ti \B (t, de) 


In order to simplify the notation, we write L; (t) = L (t,T;,T;41). Under the forward prob- 
ability measure Q* (Tj41), the Libor rate L; (t) is a martingale: 
dL; (t) = y; (t) Li (t) awe“ (4) (9.22) 


Then, we can use the Black formula (9.18) on page 521 to price caplets and floorlets where 
the volatility o; is defined by: 
ioe 
=p | Pleas 
t 


T; —t 
Therefore, we can price caps and floors because they are just a sum of caplets and floorlets. 
Flat or spot implied volatility We can define two surfaces of implied volatilities. Since 


we observe the market prices of caps and floors, we can deduce the corresponding implied 
volatilities by assuming that the volatility in the Black model is constant. Thus, we have: 


Cap,, (t) = Cap (t, To, Ti, p pra sdri) 


= 5 Caplet (t, T;-1,T;) 


i=l 


= 5S Caplet; (t) 
i=l 


where Caplet, (t) = C (Li_-1 (t), K,oi-1,T;) and C (L, K,o,T) is the Black formula with 
volatility øo. The implied volatility © (K,T) is then obtained by solving the following equa- 
tion: 


Sc (Li-1 (t) _K, X, 7;) = Cap, (t) 


The implied volatility is also called the ‘flat’ volatility and is denoted by 5#** (K, Tn). In this 
case, there is a flat implied volatility for each strike K and each maturity T, of caps/floors. 
However, we can also compute an implied volatility © (K,T) for each caplet. We have: 


Cap,, (t) = Cap (t, To, Th, eg Th) 


= 5 Caplet (t, T;-1, Ti) 


i=1 
= X_C (Li (t), K, E (K,T;-1),T;) 
i=1 
The estimation of the implied volatility surface is obtained by minimizing the sum of squared 


residuals between observed and theoretical prices. In this case, the implied volatility is called 
the ‘spot’ volatility and is denoted by £% (K, T;—1). 
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Example 86 We consider 6 caplets on the 3M Libor rate, whose strike is equal to 3%. The 
tenor structures are respectively (3M,6M), (6M,9M), (9M,12M), (12M,15M), (15M,18M) 
and (18M,21M). In the following table, we indicate the price of the six caps, whose notional 
is equal to $1 m. 


Maturity of the cap 6M 9M 12M 15M 18M 21M 
Cap price 151.50 529.74 1259.38 2221.82 3295.31 4594.40 


We indicate below the current value of the forward Libor rate, and also the value of the 
zero-coupon rate. 


Start date Ti—ı 3M 6M 9M 12M 15M 18M 
Maturity T; 6M 9M 12M 15M 18M 21M 
Forward Libor rate 3.05% 3.15% 3.30% 3.40% 3.45% 3.55% 
Zero-coupon rate 3.05% 3.10% 3.15% 3.20% 3.25% 3.30% 


Given the term structure of the volatility, we can price the caplets and the caps®”. Since 
we have the price of the caps, we can calibrate the flat and spot implied volatilities. We 
obtain the results given in Table 9.5. 


TABLE 9.5: Calibration of 5" (K, Ta), 'P°t (K,T;) and yi 


Tn iat (K, Tn) i T; yspet (K, T;) i T; Vi 

6M 5.000% | 3M 5.000% | 3M 5.000% 
9M 5.083% | 6M 5.199%  , 6M 5.391% 
12M 5.130% 9M 5.449% 9M 5.918% 
15M 5.158% ' 12M 5.497% '12M 5.637% 
18M 5.192% |, 15M 5.557% | 15M 5.794% 
21M 5.214% | 18M 5.616% | 18M 5.899% 


We consider that the functions 7; (t) are the same and are equal to y (t). If we assume that 
y (t) is a piecewise constant function, we have: 


o f if t € [0, To| 
y(t) = { Yi ift € [B-T] 


It follows that: 
T; T;-1 Le 
1 y? (s) ds = f 7" (s) ds +f 7? (s) ds 
0 0 Ti-1 


or: 
FDP (KT)? = TDP" (Kay + (T; — T;-1) 72 


We deduce that: 
Yo = pspot (K, To) 


29The i* cap is the sum of the first i caplets. 
30For instance, if we assume that the volatility o; for the second caplet is 5%, we obtain: 


Caplet (0,6M, 9M) = 10° x 0.25 x e79-75*3.05% x (3.15% x ® (d1) — 3% x ® (d2)) = $394.48 


where: 
d 


1 ae 
1 Z ln 
5% x V0.5 3% 


dz = dı — 5% x V0.5 = 1.3623 


1 
) z X 0M X 0.5 = 1.3977 


and: 
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and: 


_ | T,dsPet (K,T)? — Ti_1 UsPet (K, Tj) 
al T;, — Tj-1 


Therefore, we can use the spot volatilities to calibrate the function y(t) (see Table 9.5 and 
Figure 9.16). 


6.0 
5.9 + ° : &——_ 
5.74 : : : 
a m 
QT 56r : -- 
: e 
c : ae 
Š 55H : D d 
D : a-7 @— Flat volatility 
= 5.4 F — ra @--: Spot volatility 
2 : / — rt) 
g 53H : Pa 
: / 
5.24 : Pa 
> A 
5.14 2 4 
7 
F <7 
5.0-— if 
4.9 
3M 6M 9M 1Y 15M 18M 21M 
Maturity 


FIGURE 9.16: Flat and spot implied volatilities 


Remark 102 There is a lag between the flat volatility and the spot volatility, because we 
use the convention that the flat volatility is measured at the maturity date of the cap while 
the spot volatility is measured at the fixing date. In the previous example, the first flat 
volatility corresponds to the 6-month maturity date of the cap, whereas the first spot volatility 
corresponds to the 3-month fixing date of the caplet. 


Dynamics under other probability measures The dynamics (9.22) is valid for the 
Libor forward rate L (t, Ti, T;i+1). Then, we have: 


ALo (t) = yo (t) Lo (t) awg ™ (8) 


AL naa (4) = Yn- (Ë) Ena (6) AWE C) 


It is obvious that the Wiener processes (Wo, ..., Wn—1) are correlated. We can show that 
the dynamics of L; (t) under the probability measure Q* (Tk+1) is equal to: 
dL; (t) 
Li (t) 


= pi (t) dt + yi (t) AWE TD (t) 
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where’: 


(Tj+1 — Tj) Ly (t) : ; 
i =-% i, fk> 
METE P P a aa 


and p;,; is the correlation between we (B+) and we (Tita), 

Brigo and Mercurio (2006) derive the risk- ml dynamics of the forward Libor rate 
L; (t) when we use the spot numéraire M (t) = exp ( Sir s) ds). However, the expression 
is complicated and it is not very useful from a practical point of view. This is why they 
define another version of the spot numéraire, when the money market account is rebalanced 
only on the resetting dates To, T1, ...,Tn—1. Let y(t) be the next resetting date index after 
time t, meaning that y(t) = i if T;_1 < t < T;. The spot Libor numéraire is then defined 
as follows: 


p(t)—1 
Mİ (t) = B (t, Tow) [] (1+ 62; (T) 
j=0 
and we have: 
dL: (t) _ dj Ly (t) 
L © D Pig 75 (t TIRAN dt + yi (t) dW (t) 


) oe, 


where we (t) is a Brownian motion when the numéraire is Mt (t). 


The swap market model Since forward Libor rates L; (t) are log-normal distributed, 
the forward swap rate ue’ ) cannot be log-normal. Then, the Black formula cannot be 
applied to price swaptions®?. However, we can always price swaptions using Monte Carlo 
methods by considering the spot measure (Glasserman, 2003). To circumvent this issue, 
Jamshidian (1997) proposed a model where the swap rate is a martingale under the annuity 
probability measure Q*: 
dSw (t) = n (t) Sw (t) dW® (t) 

Again, we can use the Black formula for pricing swaptions. However, we face the same 
problem as previously, because forward swap and Libor rates cannot be both log-normal. 


9.2 Volatility risk 


In the first section of this chapter, we have seen canonical models (Black-Scholes, Black, 
HJM and LMM) used to price options. In fact, they are not really ‘option’ pricing models in 
the sense that European options such as calls, puts, caps, floors and swaptions are observed 
in the market. Indeed, they are more ‘volatility pricing models, because they give a price 
to the implied volatility of European options. Knowing the implied volatility surface, the 
trader can then price exotic or OTC derivatives, and more importantly, define corresponding 
hedging portfolios. 


311f k < i, we have: 


Eni- T;) L; (t) 
his (t) = % ( P + (Tj+1 — T) L (t) 


32 Nevertheless, there exist several approximations for pricing swaptions (Rebonato, 2002). 


530 Handbook of Financial Risk Management 


9.2.1 The uncertain volatility model 


On page 512, we have seen that the P&L of the replicating strategy is given by the 
formula of El Karoui et al. (1998): 


T 
VT) -FSI = 5 f PH (HTK) - 07) HH at 


If we assume that o (t) € [o,o], we obtain a simple rule for achieving a positive P&L: 


e if T (t) > 0, we have to hedge the portfolio by considering an implied volatility that 
is equal to the upper bound oT; 


e if C(t) < 0, we set the implied volatility to the lower bound o7. 


This rule is valid if the gamma of the option is always positive or negative, that is when 
the payoff is convex. Avellaneda et al. (1995) extend this rule when the gamma can change 
its sign during the life of the option. This is the case of many exotic options, which depend 
on conditional events (butterfly, barrier, call spread, ratchet, etc.). 


9.2.1.1 Formulation of the partial differential equation 


We assume that the dynamics of the underlying price is given by the following SDE: 
dS (t) = r (t) S(t) dt + o (t) S(t) dW (t) (9.23) 


a” <a(t)<ot (9.24) 


)) be the option price, whose payoff is f (S (T)). Avellaneda et al. (1995) show 
t)) is bounded: 


(6S (6) < V ESO) < VT ES) 


where V~(t,S(t)) = -o o [exp (= SF (s) r (s) ds) FST), VESH) = 
SUPQ(c) pA) [exp (- fir (s) ds) f (S(T ))| and Q (ø) denotes all the probability measures 


such that Equations (9.23) and (9.24) hold. We can then show that V~ and V* satisfy the 
HJB equation: 


1 3? V (t, S) OV (t, S) 
; f [ea iE be s Hea 
w/m ( SS gga ag )+ 


ƏV (t, S) 
~ rA ES) = 0 


Solving the HJB equation is equivalent to solve the modified Black-Scholes PDE: 


T (T (t, S)) S202V (t, S) + b (t) SƏsV (t, 8) + HV (t, 8) — r (V (t, 8) =0 
V (T, S (T)) = f (S (T)) 


where: 


TORI > reer for V(t, §(t)) = Vt (t,8 (0) 


and: 


{ o ifx>0 for V (t, 5 (t)) = VT (t, S(t) 


ot ifx<0 
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Since T (t, S) = 02V (t, S) may change its sign during the time interval [t,T], we have to 
solve the PDE numerically. A solution consists in using finite difference methods described 
in Appendix A.1.2.4 on page 1041. 


Let ui” be the numerical solution of V (tm, Si). At each iteration m, we approximate the 
gamma coefficient by the central difference method: 


m m m 
Upp — 2U; + uy 


T (tm, Si) > h2 


By assuming that: 
sign (T (tm, Si)) © sign (T (tm+1, Si) 


we can compute the values taken by o (T (t, S)) and solve the PDE for the next iteration 
m+ 1. 
9.2.1.2 Computing lower and upper pricing bounds 
If we consider the European call option, we have T (t, S) > 0, meaning that: 
Vt (t, S (t)) = Ces (t, S (t), 0°) 
and: 
V- (t, S(t)) = Ces (t, S(t) a) 


where Cgs (t, 9,0) is the Black-Scholes price at time t when the underlying price is equal 
to S and the implied volatility is equal to ©. Then, the worst-case scenario occurs when the 
volatility ø (t) reaches the upper bound of. 


This result is obtained because the delta of the option is a monotone function with 
respect to the underlying price. However, this property does not hold for many derivative 
contracts, in particular when the payoff is path dependent. In this case, the payoff depends 
on the trajectory of the underlying asset. For instance, the payoff of a barrier option depends 


on whether a certain barrier level was touched (or not touched) at some time during the life 


of the option. We give here the payoff associated to the four main types of single barrier®®: 


e down-and-in call and put options (DIC/DIP): 

fRarrier (S (T)) = 1 {5 > fi, min 93 (t) < z} < fvanitla (S (T)) 
e down-and-out call and put option (DOC/DOP): 

fRarrier (S (T)) = 1 fs > L, min S (t) > 1} < fVanilla (S (T)) 
e up-and-in call and put options (UIC/UIP): 


feari (S(T) =1 fso < H, max 5 (1) > n) Ain 


33We have: 
(S(T) —K)* for the call option 
(K-—S(T))" for the put option 


fVanilla (S (T)) = { 
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e up-and-out call and put options (UOC/UOP): 
f Barrier (S (T)) =1 fs < A, max 5 (t) < i} i fVanilla (S (T)) 


In the case of knocked-out barrier payoffs (DOC/DOP, UOC/UOP), the option termi- 
nates the first time the barrier is crossed, whereas knocked-in barrier options (DIC/DIP, 
UIC/UIP), the payoff is paid only if the underlying asset crosses the barrier. These barriers 
can also be combined in order to obtain double barrier options: 


e double knocked-in call and put options (KIC/KIP): 

Barrier (S (T)) = 1 {5 (t) ¢ [L, H] ,t € T}: fvanina (S (T)) 
e double knocked-out call and put option (KOC/KOP): 

Barrier (S (T)) = 1 {5 (t) € [L, H], t € T}: fvanina (S (T)) 


These options also depend on the time monitoring t € 7 of the barriers. In particular, we 
distinguish continuous (7 = [0, T]), window (7 c [0, T]) and discrete (T = {t1, te,...,tn}) 
barriers. 


Example 87 We consider a double KOC barrier option with the following parameters: 
K = 100, L = 80, H = 120, T = 1, b=5% andr = 5%. We assume that the volatility o (t) 
lies in the range of 15% and 25%. 


In the first and second panels of Figure 9.17, we report the price V (T, S$) of the call 
option for the continuous barrier (7 = [0,1]). If we use the Black-Scholes model**, the 
upper bound is reached when ø (t) = o~ = 15% whereas the lower bound is reached when 
a(t) = ot = 25%. We have the feeling that the barrier price is a decreasing function of 
the volatility. However, this is not true. Indeed, a high volatility increases the time value 
of the final payoff (S (T) — K j5 but also decreases the probability to remain within the 
barrier interval [L, H]. Therefore, there is a trade-off between these two opposite effects. 
If we consider the uncertain volatility model (UVM), the upper bound is larger than this 
obtained with the BS model, because the worst-case scenario is to have a low volatility 
when the asset price is close to one barrier and a high volatility when the asset price is 
far way from the barriers. Therefore, the worst-case scenario at time t depends on the 
relative position of S (t) with respect to L, H and K. If we consider a window barrier with 
T = [0.25,0.75], we obtain the third and fourth panels of Figure 9.17. We notice that the 
BS price is not monotone with respect to the volatility. When the current asset price So is 
equal to the strike K, the BS price is higher when ø (t) = o7 = 15%. This is not the case 
when So = 150. The reason is that a high volatility increases the probability than the asset 
price is below the up barrier H when the window is triggered. A high volatility is also good 
when the window ends. 


9.2.1.3 Application to ratchet options 


Ratchet or cliquet options are financial derivatives that provide a minimum return in 
exchange for capping the maximum return. They are used by investors because they may 


34Prices can be computed by numerically solving the PDE, or using the closed-form formulas of Rubinstein 
and Reiner (1991). 
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FIGURE 9.17: Comparing BS and UVM prices of the double KOC barrier option 


protect them against downside risk. Let us see an example to understand the underlying 
mechanism of such derivative contracts. 

We consider a cliquet option with a 3-year maturity on an equity index S (t). The fixing 
dates corresponds to the end of each calendar year. We assume that the initial value So of 
the index is equal to 100. The payoff of the cliquet option is: 


3 
f(S(T))=N- Yo max (o. si a 


where {T;, T2, T3} are the fixing dates and N is the notional of the cliquet option. This 
cliquet option accumulates positive annual returns. In the following table, we have report 
four trajectories of S (Tj): 


S(T) #1 #2 #4 


S(O) 100 100 100 100 
S(1) 120 110 95 90 
S (2) 85 125 95 50 
S (3) 90 135 75 70 


Coupon 25.9% 31.6% 0% 40% 


More generally, the payoff of a ratchet is: 


f (S (T)) = N -min | Cp, max | Fy, X_ max (Fe, min (Ce, Rj — Ke)) — Kg 


j=1 


where C, is the global cap, Fy is the global floor, Kọ is the global strike, Ce is the local cap, 
Fy is the local floor and Kọ is the local strike. Here, Rj is the return between two fixing 
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dates: 
S (Lj) — 8 (Tj-1) 


S (Tj-1) 
At the maturity, the buyer of the cliquet option receives the sum of periodic returns subject 


to local and global caps, floors and strikes. In the market, one of the most common payoffs 
is the following: 


Rj = 


f (S(T)) =N-max Dy ns (0.min Q ay z 1) 


j=l j—1 
With this payoff, the option buyer is hedged against the fall of the asset price and has the 
guarantee to have a minimum return that is equal to the global floor F}. On the contrary, 
the option buyer limits the upside risk by introducing the local cap Ce. Therefore, the price 
of the option is bounded: 


el. Fy < f (S (T)) <e- - max (Fy, nC) 


The fundamental issue of cliquet option pricing is the choice of the volatility model to 
price the forward call option: 
( S( 3) = 1) Fo 
S (Tj-1) 


At first sight, we might consider the following solutions: 


e we may use the implied forward volatility between Tj—ı and Tj, which is calculated 
as follows: 


Tj ©? (Tj) = Ty-1 ©? (Tj-1) + (Gj — Tj-1) - £? (Tj-1, T3) 


e we may also use the implied volatility of maturity T; — Tj_1 at the date Tj_1; this 
implies to have a dynamic model of the implied volatility surface. 


Since the payoff is locally non-convex, it is not possible to calculate a conservative price using 
the Black-Scholes model. In this case, the choice of a good implied volatility is inappropriate. 

Wilmott (2002) illustrates the difficulty of pricing cliquet options by comparing Black- 
Scholes and uncertain volatility models. The BS price can be calculated using the Monte 
Carlo method®°. Another solution is to derive the corresponding PDE. In this case, we have 
to introduce two additional variables: S’ = S$ (Tj_1) is the value of S(t) at the previous 
fixing date and Q is a variable to keep track of the payoff: 


z I S (T;) 
= S ) 0 CY. - 1 
i jal oon ( “ ( “ETa )) 
The value of the option depends then on four state variables: 


V=VG,S,5',Q) 


35For that, we simulate the asset price at the fixing dates {0, To, ..., Tn, T} using the risk-neutral prob- 
ability measure Q and we calculate the mean of the discounted payoff. 
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We deduce that V (t, S,.5’,@) satisfies the following PDE between two fixing dates T;_1 
and T}: 


10285202V (.) $b) $dsV (FAV) —r (VO) =0 
whereas the final condition is: 
V (T,S,8',Q) = N - max (Fy, Q) 


As noted by Wilmott (2002), V (t, S,S',Q) must also satisfy the jump condition at the 
fixing date T;: 


S 
V (78,89) =V (aa S, Q + max (0min @ a 1))) 
This jump condition initializes the new value of S” for the next period [T;_1,7;] and update 
the payoff Q. By introducing the state variable x = S/S’, Wilmott reduces the dimension 
of the problem to three variables t, x and Q: 


5072 O2V (t,£, Q) + b(t) £V (t,z,Q) + HV (t,2,Q) -r (t) V(t, 2, Q) =0 
V(73,.2,2) =V (T7, 1, Q + max (0, min (C2, £ — 1))) 
V (T,2,Q) = N - max (Fy, Q) 


This PDE can easily be solved numerically and the price of the cliquet option is equal to 
V (0,1,0). For the uncertain volatility model, we have exactly the same PDE, except that 
the quadratic term is replaced by 40° (T (t, £)) x?02V (t, x, Q). 


Example 88 We consider a cliquet option with the following parameters: r = 5%, b = 5%, 
F, = 10%, Cy = 12% and N = 1. The maturity is equal to 5 years, and there are 5 annual 
fixing dates. The volatility o (t) lies in the range 20% to 30%. 


In Figure 9.18, we show the PDE solution V (0, x, 0) for constant volatility and volatility 
ranges. We notice that the BS price is not very sensitive to the volatility. With respect to 
the mid volatility ¢ = 25%, the BS price increases by 1.35% if the volatility is 30% and 
decreases by 1.57% if the volatility is 20%. On the contrary, the UVM price range (V+ — V~) 
represents 34% of the BS price. This result depends on the values of the global floor and the 
local cap. An illustration is provided in Figure 9.19, which gives the relationship’? between 
the cliquet option price V (0,1,0) and the local cap Cy. 


9.2.2 The shifted log-normal model 


This model assumes that the asset price S (t) is a linear transformation of a log-normal 
random variable X (t): 
S(t) =a(t) + B() X (é) 


where 6 (t) > 0. Then, the payoff of the European call option is: 


f(S(T)) = (S(T)-K)* 
= (a(f)+8(T)X(T)—K)* 


= sm (xm - 5e 


36 The parameters are those given in Example 88. 
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FIGURE 9.18: Comparing BS and UVM prices of the cliquet option 
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FIGURE 9.19: Influence of the local cap on the cliquet option price 
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This type of approach is interesting because the pricing of options can then be done using 
the Black-Scholes formula: 


K-a(T) 
B(T) 


where bx and ox are the drift and diffusion coefficients of X (t) under the risk-neutral 
probability measure Q. This modeling framework has been introduced by Rubinstein (1983) 
and popularized by Damiano Brigo and Fabio Mercurio in a series of working papers written 
between 2000 and 200337. This model was originally used in order to generate a volatility 
skew, but it is now extensively used in interest rate derivatives because it extends the Black 
model when facing negative interest rates. 


C (0, So) = B(T) CBs (x ,ox,T, bxvr) 


9.2.2.1 The fixed-strike parametrization 
Let us suppose that: 


OEE (G (t) — 57°) t+ow? w) 
We have So = a + 8 meaning that: 
SO ado ex (G (t) — 50°) t+ow? w) (9.25) 


Let b the cost-of-carry parameter of the asset. Under the risk-neutral probability measure, 
the martingale condition is: 


zQ fes (t) | Fo] = So 


Since we have E? [S (t)] = a + (So — a) eè? ®t, we deduce that the no-arbitrage condition 
implies that: 


a+ (So — a) et = Soe” 


bt _ 
pe (i) = ao (2—5) 


t So— a 


or: 


The payoff of the European call option is: 


F(S(T)) = (S(T)-K)* 
= ((S(7)-a)-(K=@))" 


We deduce that the price of the option is given by: 
C (0, So) = Ces (So — a, K — a, o, T, b? (T) ,r) (9.26) 


In Figure 9.20, we report the volatility skew generated by the SLN model when the 
current price So of the asset is 100, the maturity T is one year, the cost-of-carry b is 5% 
and the interest rate 5 is 5%. We notice that the parameter ø of the SLN model is not of 
the same magnitude than the implied volatility of the BS model. This is due to the shift a. 
When a is positive (or negative), we have o > È (T, K) (or o < È (T, K)). 


37See Brigo and Mercurio (2002a) for a survey of their different works. 
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FIGURE 9.20: Volatility skew generated by the SLN model (fixed-strike parametrization) 


9.2.2.2 The floating-strike parametrization 


Let us now suppose that: 


S(t) = aet + pelt-30°)ttow®? 


We have So = a + 8 and E? [S (t)] = aet + Ge’. We deduce that the stochastic process 
eS (t) is a Fy-martingale if it is equal to: 


S (t) = ae” + (So — a) el07207 eto) (9.27) 
The payoff of the European call option becomes: 
f(S(T)) = (S(T)-K)* 
= ((S(T)- ae’) —(K- aetT)) 
It follows that the option price is equal to: 
C (0, So) = CBs (So — a, K — aef o,T,b, r) (9.28) 


Examples of Volatility skew are given in Figure 9.21 with the same parameters than those 
we have used in Figure 9.20. 


Remark 103 At first sight, the floating-strike parametrization seems to be different than 
the fixed-strike parametrization. In practice, the parameters (a,o) are calibrated for each 
maturity T. This explains that the two parametrizations are very close. 
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FIGURE 9.21: Volatility skew generated by the SLN model (floating-strike parametriza- 
tion) 


9.2.2.3 The forward parametrization 


If we consider the forward price F (t) instead of the spot price S(t), the two models 
coincide because we have b = 0. In this case, the dynamics of the forward price is: 


dF (t) = o (F (t) — a) dW (t) (9.29) 
and the price of the option is given by the Black formula®®: 
C (0, So) = CBiack (Fo — a, K — a, 0, T, r) (9.30) 


In Equations (9.29) and (9.30), we impose that a < Fo and a < K. This implies that 
F (t) € [a,co). This model is appealing for fixed income derivatives, because the interest 
rate may be negative when a is negative. In this case, we have: 


dF(t) = (oF (t)—ac) dw? (t) 
= (oF (t) +02) dw®(t) 


where cı = o and og = —ao > 0. We obtain a stochastic differential equation whose 
diffusion coefficient is a mix of log-normal and Gaussian volatilities. 


38We recall that the Black formula can be viewed as a special case of the Black-Scholes formula when the 
cost-of-carry parameter b is equal to zero: 


CBlack (2, k,o,T, r) = CBs (x, K,o,T, 0, r) 
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Lee and Wang (2012) prove the following results: 


. OU(T,K)\ 4. 
sign aK = signa 


e monotonicity in strike: 


e upper and lower bounds: 


u(T,K)<o ifa>0 
U(T,K)>oa ifa<0 


e sharpness of bound: 
lim U(T,K) =o 


K-00 


e short-expiry behavior: 


oln (Fo/K) , 
lm 3(7,K)={ In(@—a)/(K-a) “EAA 
= o(1—aFj") if K= Fo 


The implied volatility formula does not depend on the maturity T and is only valid 
when T is equal to zero. However, it is a good approximation for other maturities as shown 
in Table 9.6. We use the previous parameters and three different maturities (one-month, 
one-year and five-year). 


TABLE 9.6: Error of the SLN implied volatility formula (in bps) 


(a = 22,0 = 25%) | (a = —70,0 = 12%) 
IM 1Y 5Y 1M 1Y 5Y 
80 | 1.0 11.1 57.0 0.9 12.9 66.0 
90 | 0.7 10.6 54.1 1.0 11.9 61.4 
100 | 0.9 10.2 51.6 1.1 11.3 57.3 
110 | 1.0 9.7 49.6 0.8 10.8 53.8 
120 | 0.7 93 47.7 0.6 10.3 51.3 


9.2.2.4 Mixture of SLN distributions 


One limitation of the SLN model is that it only produces a volatility skew, and not a 
volatility smile. In order to obtain a U-shaped curve, Brigo and Mercurio (2002b) suggest 
that the (risk-neutral) probability density function f (x) of the asset price density is given 
by the mixture of known basic densities: 


where f; is the j*" basic density, p; > 0 and iP = 1. Let G(S(T)) be the payoff of 
an European option. We have: 


C (0, So) 12 [eG (S (T))| Fo] 


I e™"TG (S (T)) f (x) dz 


II 
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We deduce that: 


€(0,50) = f eTa) Y pif (e) ae 


am i eG (S (T)) fi (2) de 


= J pE% [eG (S(T))| Fo] 


where Q; is the j*» probability measure. It is then straightforward to price an European 
option using formulas of basic models. If we consider a mixture of two shifted log-normal 
models, the price of the European call option is equal to: 


C(0,S0) = p-Cgin (So, K, 01,7, b, r, 01) + 
(1 = p) J Cstn (So, K, o2, T, b, T, a2) 


where Csin is the formula of the SLN model’. The model has five parameters: 01, C2, Q1, 
Q@2 and p. 


Example 89 We consider a calibration set of five options, whose strike and implied volatil- 
ities are equal to: 


K; 80. 90 100 10 120 
E(1,Kj) 21% 19% 18.25% 18.5% 19% 


The current value of the asset price is equal to 100, the maturity of options is one year, the 
cost-of-carry parameter is set to 0 and the interest rate is 5%. 


The parameters are estimated by minimizing the weighted least squares: 


n 
‘ 2 
min) wj (ĉ; — Csın (So, Kj, 01,02, T}, b, r,a, a2,p)) 


j=1 


where: 


Cj = Ces (So, Kj, ¥ (Tj, Kj) ,Tj,b,7) 


and w; is the weight of the j*® option. We consider three parameterizations: (#1) the 
weights w; are uniform, and we impose that a; = ag and p = 50%; (#2) the weights wj 
are uniform, and p is set to 25%; (#3) the weights w; are inversely proportional to option 
prices Cy, and p is set to 50%. Results are given in Table 9.7 and Figure 9.22. We notice 
that a; and a2 can take large values. Shifted log-normal models are generally presented as 
a low perturbation of the Black-Scholes model. In practice, they are very different. 


9.2.2.5 Application to binary, corridor and barrier options 


One of the difficulties when using the Black-Scholes model with exotic options is the 
choice of the implied volatility. In the case of an European call option, it is obvious to 
use the implied volatility © (T, K) that corresponds to the strike and the maturity of the 
option. In the case of a double barrier option, we can use the implied volatility © (T, K) 


39Tt corresponds to one of the three expressions (9.26), (9.28), and (9.30). 
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TABLE 9.7: Calibrated parameters of the mixed SLN model 


Model #1 #2 #3 
Ti 16.5% 8.2% 10.2% 
02 7.3% 17.2% 21.7% 
a —53.3  —289.7  —145.2 
ai —53.3 19.6 47.4 


p 50.0% 25.0% 50.0% 
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FIGURE 9.22: Implied volatility (in %) of calibrated mixed SLN models 


that corresponds to the strike of the option, the implied volatility © (T, L) that corresponds 
to the lower barrier of the option, the implied volatility © (T, H) that corresponds to the 
higher barrier of the option, or another implied volatility. In fact, there is no satisfactory 
answer. 


Let S (t) be the asset price at time t. The payoff of the binary cash-or-nothing call option 
is: 
f(S(T) =1{S(T) > K} 
We deduce that: 


T 
BCC (0, So) = EL | e” La {S (T) > K) r| 
If we consider the Black-Scholes model, we obtain: 
BCC (0, So) = e "T8 (dz) 


We can replicate this option by using the classical dynamic delta hedging approach pre- 
sented on page 495. Here, we consider another framework, which is called the static hedging 
method. The hedging portfolio consists in: 
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e a long position on the European call option with strike K; 
e a short position on the European call option with strike K + e. 


If the notional of each option is set to £, the value of the hedging portfolio at time t is equal 


to: 
1 1 
It follows that the value of the hedging strategy is equal to: 


X (t) = BCC (t, S (t)) — V (t) 
We notice that: 
lim X (T) BCC (T, S (T)) — lim V (T) 


e—0 e—0 


II 


2 _ (S(t) 
= 1{S(T)> K} lim - 
= 1{S(T)>K}-1{S(T)> K} 
= 0 

The no-arbitrage condition implies that: 


C(t, S(t), K) —C(t, S(t), K +e) 


BCC(t,S(f)) = lim : 
ym CE SOK +9) -Ct S(t), K) 
e—0 
AC (t, S(t), K) 4 


This result is valid only if the volatility is constant. If the volatility is not constant, the 
price BCC (t, S (t)) becomes: 


im CES, K, E (2, K)) -C (t, 9 (t), K +6,5 (T,K +6)) 

e>0 

AC (t, S(t), K, E (T, K)) aC (t,5(t),K,5(T,K)) aD(T,K) 
OK ad ORK 

BCCgs (t, S (t) , E (T, K)) — vps (t, S (t), E (T, K)) w (T, K) 


II 


where BCCgs (t, S (t) , © (T, K)) is the Black-Scholes price with implied volatility 5 (T, K), 
vegs (t, S (t), E (T, K)) is the Black-Scholes vega for the European call option and w (T, K) 
is the skew of the volatility surface: 


This framework, called the skew-method (SM) model, shows that taking into account the 
volatility smile cannot be reduced to choosing the right implied volatility, because we have: 


Example 90 We price a binary call option when the underlying asset price is 100, the 
maturity of the option is 6 months, and the parameters b and r are equal to 5%. The skew 
w (T, K) of the implied volatility can take the values 0, —20 and +20 bps. We consider two 
cases for the implied volatility: (1) £ (T, K) is equal to 20%, (2) £ (T, K) is a linear function 
with respect to K: 

U(T,K) = S(T, So) +w(T, K) - (K — So) 


544 Handbook of Financial Risk Management 


Volatility = 207 


x S 
a SK —«(T,K) = 0 
2 A N =-= w(T,K) = —20 bps 
9 ING =-= w(T,K) = +20 bps 
a NG 
R `L 
7 Se 
NA 3e 
100 110 120 130 


£(T,K) (in %) 
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FIGURE 9.23: Impact of the implied volatility skew on the binary option price 


Figure 9.23 represents the relationship between the binary call option price BCC (0, So) 
and the strike K. The first panel assumes that the implied volatility £ (T, K) is equal to 
20%. We verify that: 


w(T, K) <0 > BCCsy (0, So) < BCCss (0, 5, E (T, K)) 
Ww (T, K) > 0 > BCCsm (0, So) > BCCgs (0, So, 00 (T, K)) 


However, the results shown in the first panel may be misleading, because it is not possi- 
ble to compare the price for two different strikes. Indeed, if Kə > Kı and w(T,K) > 0 
for every strike K, this implies that ÈX (T, K2) > E (T, Kı), BCCgs (0, So, È (T, K2)) > 
BCCgs (0, So, & (T, K1)), but ves (0, So, È (T, K2)) > ves (0, So, È (T, K1)). A higher im- 
plied volatility increases the binary option price thanks to the impact on the Black-Scholes 
price, but also reduces it thanks to the impact on the vega. Therefore, the second and third 
panels are more useful to understand the dynamics of the binary option price with respect 
to the strike. We observe that it is more complex because of the two contrary effects. 


We now assume that the shifted log-normal model is the right model. We have: 
1{S(T)>K} & 1{a(T)+8T)X(T)> K} 
= 1 { (So — a) elo 207)THOWA(T) ~ g aet} 
We deduce that: 
BCCsrn (0, So) = fes (So — a, K — ae”, 0,T, b,r) (9.31) 


where fgs is the Black-Scholes formula of the BCC option. Equation (9.31) is equivalent to 
shift the current price and the option strike. 
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TABLE 9.8: Price of the binary call option (a = —50,0 = 15%) 


K X(T,K) v(T,K) BS SLN SM 
80 23.64 -5.47 0.8087 0.8184 0.8184 
90 23.14 —4.57 0.6761 0.6895 0.6895 
100 22.72 —3.87 0.5160 0.5306 0.5306 
110 22.36 —3.34 0.3582 0.3715 0.3715 
120 22.05 —2.92 0.2271 0.2374 0.2374 


TABLE 9.9: Price of the binary call option (œ = 50,0 = 40%) 


K (T,K) v(T,K) BS  SLN SM 
80 16.71 17.25 0.8937 0.8780 0.8780 
90 18.21 13.13 0.7390 0.7055 0.7055 
100 19.39 10.51 0.5364 0.4971 0.4971 
110 20.34 8.69 0.3546 0.3202 0.3202 
120 21.14 7.35 0.2209 0.1953 0.1953 


We consider the following parameters: So = 100, T = 1, b = 5% and r = 5%. The 
SLN parameters a and o are equal to —50 and 15%. In Table 9.8, we price the binary 
call option with three models: the Black-Scholes model with the implied volatility È (T, K), 
the SLN model and the SM approximation using the implied volatility £ (T, K) and the 
volatility skew w (T, K). We remark that the Black-Scholes model produces bad option 
prices, whereas the SM prices are equal to those obtained with the SLN model. We obtain 
the same conclusion with an increasing smile as shown in Table 9.9. 


The previous analysis can be extended to many other payoffs including corridor and 
barrier options. For instance, the holder of a corridor option receives a coupon at maturity, 
the magnitude of which depends on the behavior of a specified spot rate during the lifetime 
of the corridor. A special case is the range binary corridor option that pays a fixed coupon 
c if the asset stays within the range [L, H]: 


f(S(T)) =e) (1 {8 (T;) € [L, H]} 


where {T;,..., Tn} are the fixing dates of the corridor option. Since we have: 


1{5(T;) € [L,H]} = 1{L< S(T) < H} 
e 1{5(T;) > I} -1{9(T;) 2 H} 


we deduce that the price CC (0, So) is related to a series of BCC cash flows: 


CC (0, So) = c$. (BCC (0, So, L) — BCC (0, So, H)) 


j=1 


where BCC (0, So, K) is the price of the cash-or-nothing binary call option, whose strike 
is K. We can then use SLN, mixed-SLN or SM models in order to take into account the 


volatility smile. 
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Remark 104 In the case of barrier options, we can use the Black-Scholes formulas of 
Rubinstein and Reiner (1991) by shifting the parameters So, K, L and H: 


9.2.3 Local volatility model 


The local volatility model has been proposed by Dupire (1994) using continuous-time 
modeling and, Derman and Kani (1994) in a binomial tree framework. It is one of the 
most famous smile models with Heston and SABR models. We assume that the risk-neutral 
dynamics of the asset price is given by the following SDE: 


dS (t) = bS (t) dt + o (t, S (t)) S(t) dW® (t) 


We can then retrieve the local volatility surface ø (t, S) from the implied volatility surface 
£E (T, K), because the knowledge of all European option prices is sufficient to estimate the 
unique risk-neutral diffusion (Dupire, 1994). 


9.2.3.1 Derivation of the forward equation 


The Fokker-Planck equation Using Appendix A.3.6 on page 1072, the risk-neutral 
probability density function q: (T, S) of the asset price S (T) satisfies the forward Chapman- 
Kolmogorov equation: 


Oa (T, S) 8 [bSa (T, S)] P 18 [o (T, S) S°q (T, S)] 
oT as 2 aS? 


The initial condition is: 
dt (t, S) = 1 {S = Si} 


where S; is the value of S (t) that is known at time t. 
The Breeden-Litzenberger formulas On page 508, we have seen that the risk-neutral 


probability measure is related to the prices of European options. In particular, we have 
found that: 


CG (T,K) = eT- f (s— Kya (2,8) aS 
K 
OC; (T, K) -r(T—t) m 
= e” a (T, S) dS 
OK R (7, 8) 
dC; (T, K) 


aK2 = @ 74, (T,K) 


Main result We also have: 


aC, (T, K) 
aT 


= -rC;(T,K)+e°) f (S—K) OG (Ts) ag 


= —rC;(T,K)+e°? OF 
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Using the Fokker-Planck equation, we obtain: 


PS f 5 oh” [o? (T, S$) S?q (T, S)] Pesama) as 
K 


OS? OSs 
1° a? [o? (T, S) Sa (T, S)] 
= A (S— K) PIE dS 
ee O [bSq: (T, S)] 
K d 
[ e-n as 
= lz T: 
= 5⁄4 
Using an integration by parts, we have: 
pe 8? [o? (T, S) Pq (T, S)| 
i, = f (S—K) sa ds 
8 [o? (T, 8) Su (T,S)] ]~ 
K 
œ ð |o? (T 2q (T 
[o (T, S) S qi ( ,8)] as 
K OSs 
= 0s |o? (7.5) a (7, 5)| 
K 


= o (T, K) Re, K) 
We notice that*?: 


2. f ô [bSa (T, S)] 
Ty = f (S-K) ds 


[e-m b34: (T, s| - f Sq (T, 8) as 
K 
À -o f Sq: (T, 8) dS 


— peT- (e (T.K) + Oleh. n 


OK 


The expression of Z is then equal to: 


1 OC, (TK 
T = —o7 (T, K) K?q; (T, K) + be" T-® | C, (T, K) — goo) 
2 OK 
40Using Breeden-Litzenberger formulas, we have: 
eT-OC,(T,K) = | (S — K) ț (T, S) dS 
K 
K K 
°° _4 OC; (T, K) 
= _ r(T—t) t 
a Sa (T, S) dS — Ke =ar 
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It follows that: 


OC; (T, K) _ l > 28° Cı (T, K) 
OC, (T, K) 
(am - KERE) 
We conclude that: 
l > 28° Cı (T, K) OC, (T, K) 
5° (T,K)K IK? bK IK 
E hei r)C,(T, K) =0 (9.32) 


Differences between backward and forward PDE approaches Equation (9.32) is 
very important because it can be interpreted as the dual of the backward PDE (9.2): 


{ 1,2 (t, 9) S202V (t, S) + bSdsV (t, 8) + AV (t, 9) — rV (t, 9) =0 
V (T, S (T)) = f (T, S (T), K) 


where V (t, S) is the price of the European option, whose terminal payoff is f (T, S (T), K). 
In the case of Dupire model, the pricing formula becomes: 


142 (T, K) K?02.V (T, K) — bKôxV (T, K) — 
ƏrV (T, K) + (b— r) V (T, K)=0 
V (t, K) = f (t, S:, K) 


where V (T, S) is the price of the European option, whose initial payoff is f (t, S+, K). In 
the backward formulation, the state variables are t and S, whereas the fixed variables are 
T and K. In the backward formulation, the state variables become T and K, whereas the 
fixed variables are now the current timet! t and the current asset price S+. This is not the 
only difference between the two approaches. Indeed, the backward PDE approach suggests 
that we can hedge the option using a dynamic portfolio of the underlying asset, whereas 
the forward PDE approach suggests that we can hedge the option using a static portfolio 
of call and put options. 


We consider the pricing of an European call option with the following parameters: So = 
100, K = 100, o (t, S) = 20%, T = 0.5, b = 2% and r = 5%. In the case of the backward 
PDE, we consider the usual boundary conditions: 


C(t,S)=0 
{ OsC (t, +00) = 1 


For the forward PDE, the boundary conditions are*?: 


xC (T,0) = —1 
{ C(T,+00) =0 


In Figure 9.24, we show the relative error (expressed in bps) of numerical solutions when 
considering the Crank-Nicholson scheme. In the case of the backward PDE, the state variable 


414 can be equal to zero. 
42We can also use the following specifications: 


C(T,0) = e@-T So 
OKC (T, +00) =0 
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is the current asset price So, and we obtain all the option prices when the strike is equal to 
100. In the case of the forward PDE, the state variable is the strike K, and we obtain all the 
option prices when the current asset price is equal to 100. We notice that the relative errors 
are equivalent when Sp is equal to K. In fact, the efficiency of the numerical algorithms will 
depend on the relative position between Sp and K. 
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FIGURE 9.24: Relative error of backward and forward PDE numerical solutions 


9.2.3.2 Duality between local volatility and implied volatility 


We can inverse Equation (9.32) in order to relate the expression of the local volatility 
and the price of the call option: 


0K OKC (T, K) + Ar€ (T, K) ~ (br) € (T, K) 


2 _ 
TAS K202.C (T, K) 


In Exercise 9.4.8 on page 599, we show that o (T, K) can also be written with respect to 
the implied volatility X (T, K): 


o(T,K) = re (9.33) 
where: 
A(T,K) = 32 (7, K)+2bKTS(T,K) OnE (T, K) + 
2T (T, K) ApS (T, K) 
and: 


B(T,K) = 14+2KVTd,0x(T, K) + K?T™ (T, K) 62-5 (T, K) + 
K?Tdıdə (OxX (T, K))* 
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Equation (9.33) is the key finding of Dupire (1994). Indeed, knowing the implied volatility 
surface, we can retrieve the unique local volatility function that matches the set of all 
European call and put option prices. 

Many results have been derived from Equation (9.33). For instance, if there is no skew*®, 
the local volatility function does not depend on the strike**: 
Ox (T) 

OT 
On the contrary, the local volatility always depends on the maturity T even if there is no 
time-variation in the implied volatility”. 


o? (T) = £? (T) + 2TX (T) (9.34) 


504 
\ @—_ implied volatility 
WA —--T='M 
\\ ==- T = 6M 
\ —_ ] = 
\\ T 3Y 


o(T,K) (in Z) 


7 6 90 100 o 120° 4130 
FIGURE 9.25: Calibrated local volatility o (T, S) (in %) 


Example 91 We assume that the implied volatility is equal to: 
E (T, K) = Xo + a (S — K)’ 
where Xo = 20%, œa = 1 bp, So = 100 and b = 5%. 


Figure 9.25 shows the calibrated local volatility for different values of T. We verify the 
time-variation property of the local volatility. We notice that Equation (9.34) is equivalent 
to: 

OTD? (T) 
2 
T) = ———— 
etn oT 
or: 


T 
5 (Tj= 3 o° (t) dt 


The implied variance is then the time series average of the local variance. 


43We have © (T, K) = X(T). 
44This result is obtained by setting 0x (T, K) and 83}-£ (T, K) equal to 0 in Equation (9.33). 
45We have 5 (T, K) = © (K). 
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Another important result concerns the behavior of the implied volatility near expiry. 
Let x be the log-moneyness: 


z = ọ(T,K) 


So 
In — + bT 
n t 


II 


We introduce the functions © and õ such that X (T, K) = È (T, ọ (T, K)) and o (T, K) = 
õ (T, (T, K)). Berestycki et al. (2002) showed that the implied volatility is the harmonic 
mean of the local volatility*®: 


1 o T dy 
ïS (0,£) Jo & (0, zy) 
It follows that: z 
ðX (0,0) 106 (0,0) 
ðr 2 ðr 


The ATM slope of the implied volatility near expiry is equal to one half the slope of the 
local volatility. 


9.2.3.3 Dupire model in practice 


One of the problems is the availability of the call/put prices for all maturities and all 
strikes. In practice, we only know the option price for some maturities Tm and some strikes 
K;. This is why we have to use a calibration method to obtain the continuous volatility 
surface X} (T, K). 


Time interpolation We note v (T, K) the total implied variance: 
v (T, K) = TE? (T, K) 
The linear interpolation of the total implied variance gives: 
v (T, K) = w - v (Tm, Km (T)) + (1 — w) - v (Tm+1, Km+1 (T)) 


where T € [Tm, Tm4i] and: 


w= Tmi- T 
Tm+1 — Im 
We deduce that: 
Tn (isi = T) 
ENT K) =] es hl 
TE = Fp ESE (Eins Km (1) + 
Tm+1 (T — Tm) 2 
M Trai Km T 
T na = Tm) (Tm+1; Km+1 (T)) 


2 
= am(T) » (Tm, Km (T)) + bm+1 (T) 5? (Tinta, Km+1 (T)) 
where: 
Tri (Tinta = T) 


am (T) = 7 (Cari = Dhn) 


46 See Exercise 9.4.8 on page 599 for the proof of this result. 
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and: 

Tm+41 (T = Tm) 
T (Tm+1 — Tm) 
In the previous scheme, we interpolate the total variance for the strike K and the maturity T 
by considering the pairs (Tm, Km (T)) and (Tm+1, Km+1 (T)). Generally, the strikes Km (T) 
and Km+1 (T) are a translation of the strike K: 


l Km (T) = km: (T)K 


bm+1 (T) = =] — am (T) 


Km+1 (T) = km4i + (T) K 


with km (Tm) = 1 and km+1ı (Tm+1) = 1. The simplest rule is km (T) = km+ı (T) = 1. 
Another method is to define km (Tm) = e (2-Tm) < 1 and km+1 (Tm41) = eb(Tm+1-T) > 1, 
Example 92 We assume that the implied volatility is equal to: 

E (Tm, K) = Em + am (K — 100)" 


where Sm, = 20% + 0.005 - (Tm — 1.0), Am = 0.05 -Tm bps and Tm is equal to 1, 2, 3, 4 and 
5 years. The cost-of-carry parameter b is set to 5%. 


We have represented the implied volatility £ (Tm, K) in the first panel in Figure 9.26. We 
can then compute the volatility surface. When T is lower than the first observed maturity or 
higher than the last observed maturity, we can extrapolate the implied volatility in several 
ways. The simplest method is to assume that the implied volatility is constant. In the third 
panel, we have reported the interpolated implied volatility with respect to the maturity T 
for three different strikes. We notice that it is curved between two interpolating knots due 
to the effect of the square root transformation. 
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FIGURE 9.26: Time interpolation of the implied volatility 
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Non-parametric interpolation We note Sm (K) the non-parametric function that give 
the value of X (Tm, K) for all values of strike K. The calculation of the local volatility 
surface implies to calculate the quantities ôx} (T, K), 3%} (T, K) and 07u(T,K). We 
use the shortened notations: Sm = Sm (Km (T)), Sh = Sh (Km (T)), S% = S! (Km (T)), 
Sinta = Smit (Kri (T) Shet = api (Kor (T)) and Shi = Shr (Kar (T). We 
have: 


X(T, K)OxnD(T,K) = lory? (T, K) 
= am (T) km (T) SmS! + 
bm+1 (T) km+1 (T) Sm+1Sm41 


For the second term, we obtain: 


È (T, K) 8} (T, K) 


Z (T, K) — (ôK ¥ (T, K))’ 
= am (T) k, (T) (Smif, + (Sin)”) + 
Omit (T) Kags (T) (Smt1Sh + (Sine) ) - 
(ôK? (T, K))” 
Since we have: EL 
o S Taa 
and: 
bm+1 (T) = TE 


we deduce that the last term is equal to*” 


X(T, K)dru(T,K) = sore? (T, K) 


= ly, (T) £? (Tm, Km (T)) + 

seve (T) 2? (Impi Kmp (T)) + 

am (T) E (Tm, Km (T)) rE (Tm, Km (T)) + 
) 


bm+1 (T) © ay Ket (T)) rE (Tatas Kma (T)) 


L 1 TmT mai j 
7 2 (Sm Sm) T? (Tm+1 = Tr) 
am (T) SmS, Kôârkm (T) + 


bm (T) Sm+1Sm41K Or km+1 (T) 
In the case where km (T) = km+i(T) = 1, the previous formula reduces to: 


TmTm+1 
T? (Tm+1 — Tm) 


1 
© (T, K) r¥ (T, K) = 5 (Sm41 — Sm) 


In practice, we don’t observe the function Sm (K), but only few values of © (Tm, K;) for 
some maturities Tm and some strikes K;. An example is given in Table 9.10. We assume that 


47 We use the fact that Or Km (T) = KOrkm (T) and ðr Km41 (T) = KOrkm+1 (T). 
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TABLE 9.10: Calibration set 


Tm = 1/12 

K; 87.0 92.0 96.0 98.0 100.0 103.0 106.0 110.0 116.0 

E(Tm, Ki) 13.7 13.7 13.3 13.2 13.0 13.1 13.2 13.5 13.5 
Tm = 3/12 

Ki 77.0 85.0 93.0 97.0 101.0 106.0 111.0 121.0 134.0 

E(Tm, Ki) 149 149 141 140 135 138 14.2 15.1 15.1 
Tm = 6/12 

K; 66.0 78.0 89.0 96.0 102.0 111.0 119.0 136.0 161.0 

E(Tm, Ki) 16.8 16.8 15.5 15.0 145 15.0 15.5 168 168 
Tm =1 

Ki 53.0 69.0 86.0 96.0 104.0 119.0 133.0 166.0 217.0 

¥(Im,K;) 19.0 19.0 17.0 16.0 155 165 175 18.5 18.5 
Tm =2 

K; 37.0 56.0 80.0 96.0 103.0 137.0 163.0 229.0 347.0 


E(Tm, Ki) 21.9 21.9 20.0 18.5 185 19.0 19.5 20.8 20.8 


five maturities are quoted (1M, 3M, 6M, 1Y and 2Y). For each maturity, we observe the 
implied volatility (expressed in %) for 9 strikes. This is why we have to use an interpolation 
method. In Figure 9.27, we have represented the function Sm (K) obtained with the cubic 
spline method*®. One of the issues is the interpolated implied volatility on the wings. Here, 
we have chosen to keep the cubic spline values, but an alternative approach is to assume 
that the smile is constant before the first strike and after the last strike. Let us assume 
that So = 100, b = 5% and r = 5%. Using the time approximation approach, we obtain 
the implied volatility surface given in Figure 9.28. The implied volatility is constant when 
T < 1/12 and T > 2. Finally, the local volatility surface is reported in Figure 9.29. We 
notice that it is not a smooth function. This is why we can use cubic spline approximation 
or other smoothing methods in place of cubic spline interpolation*®. However, we not not 
retrieve exactly the quoted implied volatilities with this approach. 


Remark 105 In real life, the number of strikes may be different from one maturity to 
another, and may be smaller. For example, in the case of currency options®®, we generally 
have 5 quoted options (ATM, 10-delta call, 25-delta call, 10-delta put and 25-delta put). 


Parametric calibration In the previous section, © (T, K) and o (T, K) are calibrated us- 
ing non-parametric approaches such as the cubic spline method. This produces a disorderly 
local volatility surface. In order to avoid this problem, we can use a parametric framework. 
For instance, we can calibrate © (T, K) using the SABR model. Another popular approach 
is to consider the stochastic volatility inspired or SVI parametrization. 
We recall that the total implied variance is equal to: 
UT, K) = TX? (T, K) 


We assume that v (T, K) = ò (T, x) and È (T, K) = È} (T, x) where xv is the log-moneyness: 
K i K 
F(T) Soe 


c=y(T,kK)=In 


48See Appendix A.1.2.1 on page 1035. 

49See Crépey (2003) and Fengler (2009). 

50FX vanilla options are generally quoted in terms of volatility with respect to a fixed delta, and not in 
terms of premium with respect to a given strike. 
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FIGURE 9.27: Cubic spline interpolation Sm (K) (in %) 
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FIGURE 9.28: Implied volatility surface 5 (T, K) (in %) 
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FIGURE 9.29: Local volatility surface o (T, K) (in %) 


Let op (x) = (T, x) be the total implied variance for a given maturity slice. Gatheral 
(2004) introduces the following SVI parametrization: 


or (e) = 0+ 8(p(e-m)+ lem) +02) 


where 6 > 0, 0 > 0 and p € [1,1]. We have: 


or (m) = a + Bo 


and: 

lims+—oo Or (x) = a — 8 (1 — p) (x — m) 

lims Or (x£) = a + 8 (1 + p) (£ — m) 
Gatheral deduces that œ controls the general level, 8 influences the slope of the wings, o 
changes the curvature of the smile, p impacts the symmetry of the smile while m shifts the 
smile. 


Example 93 We assume that a = 2%, B = 0.3, o = 10%, p = —40% and m = 0. Figure 
9.30 shows the impact of each parameter on the total variance tr (x). 


Gatheral and Jacquier (2014) show that a volatility surface is free of static arbitrage if 
and only if it is free of calendar spread arbitrage! and each time slice is free of butterfly 
arbitrage®’. The first property implies that: 


Orv (T, x) >0 


51 This means that the price of an European option is monotone with the maturity. 
52This means that the probability density function is non-negative for any given maturity T. 


Model Risk of Exotic Derivatives 557 


Impact of a Impact of $ 
`, 0.5 


Impact of o Impact of p 
0.5 ` 0.5 


-1.0 -0.5 0.0 0.5 1.0 


FIGURE 9.30: Impact of SVI parameters on the total variance tr (x) 


for all x € R. Thanks to Breeden and Litzenberger (1978), the second property is equivalent 
to verify thats: : 
ə C (T, K) sf 
0 kK? ~ 

These authors deduce then how the absence of static arbitrage impacts SVI parameters. 

We consider the calibration set defined in Table 9.10 on page 554. We delete the two 
extreme strikes of each maturity°*. In Figure 9.31, we show the SVI parametrization for 
each maturity. By considering the time interpolation presented previously, we can define the 
implied volatility surface © (T, K) and then calculate the local o (T, K). These two volatility 
surfaces are reported in Figure 9.31. 


Hedging coefficients Let È (T, K, S+) and o (T, K, S+) be the implied and local volatil- 
ity surfaces that depend on the current price S;. We also write the value of the option 
V (T, K, S;) as a function of the maturity T, the strike K and the current price S;. The 
delta of the option is then equal to: 


OV (T, K, S+) 


A = 
' ð S: 
If we use the finite difference approximation, we obtain: 


Aw V (T, K, Sı +2) — V (T, K, Sı — €) 
~ 2e 


53See Section 9.1.1.4 on page 508. 

54In fact, we have added these two points in the calibration set in order to stabilize the non-parametric 
calibration. However, this approach is not adequate because volatility smile is linear and not constant at 
extreme strikes (Lee, 2004). 
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FIGURE 9.31: SVI parametrization, implied volatility =(T,K) and local volatility 
a (T, K) (in %) 


Computing the option price and its corresponding delta require then to calculate three local 
volatility surfacesř* and solve the forward PDE three times*®. This method can also be used 
to calculate the gamma of the option, because we have: 


Te V (T, K, Sı +8) —2V (T, K, St) +V (T, K, Sı — €) 


E2 


The vega coefficient in a local volatility model is not well-defined. It can be measured with 
respect to the local volatility o (T, K, S+) or the implied volatility © (T, K, S+). The most 
frequent approach is to measure the vega as the sensitivity of the price to a parallel shift of 
£ (T, K, S+). We have: 

V' (T, K, S) — V (T, K, S+) 


T 
e! 


where V’(T,K,S;) is the option price obtained when the implied volatility surface is 
S(T, K,S;) +e. 

One of the issues with the local volatility model is that greeks are not easy to compute 
and are not stable in the time and across strikes. This is a severe disadvantage, since the 
hedging of the option is not straightforward and generally less efficient than the hedging 
portfolio given by the Black-Scholes model: 


“Market smiles and skews are usually managed by using local volatility models 
a la Dupire. We discover that the dynamics of the market smile predicted by 
local vol models is opposite of observed market behavior: when the price of the 
underlying decreases, local vol models predict that the smile shifts to higher 


55We have to calculate ø (T, K, St — £), o (T, K, St) and o (T, K, St + €). 
56We have to calculate V (T, K, S+ — £), V (T, K, S+) and V (T, K, St + €). 
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prices; when the price increases, these models predict that the smile shifts to 
lower prices. Due to this contradiction between model and market, delta and 
vega hedges derived from the model can be unstable and may perform worse 
than naive Black-Scholes’ hedges” (Hagan et al., 2002, page 84). 


9.2.3.4 Application to exotic options 


Another shortcoming of the local volatility model is the unrealistic probability distribu- 
tion of the conditional random variable S (t2) | S (t1). This is why this model is only used 
for European options, and not for path-dependent derivatives. In particular, it has been 
popular in the 1990s and 2000s for pricing European barrier options. 

We consider the calibration set given in Table 9.10 on page 554. We assume that So, 
b = 5% and 5 = 5%. We price different payoffs given in Table 9.11, whose parameters are 
K = 100, L = 90 and H = 115. The maturity is set to one year. Prices are calculated with 
a Crank-Nicholson scheme with 2000 discretization points’? in space, 2000 discretization 
points in time and traditional boundary conditions®®. Results are given in column LV. We 
can compare them with Black-Scholes prices calculated with implied volatilities°® ©, = 16% 
and “yz = 15.5%. For each payoff and each value of implied volatility, we report two values 
of the option price: one obtained by solving the PDE and another one calculated with the 
analytical formulas of Rubinstein and Reiner (1991). We observe some differences between 
the two prices, because the PDE price depends on the choice of the discretization scheme and 
the boundary conditions. We notice that the prices DOC, UOC, KOC and BCC calculated 
with the local volatility model are not in the interval of BS prices. 


TABLE 9.11: Barrier option pricing with the local volatility model 


Option Payoff LV Soa a? 
Call (S(T) - ou 8.85 8.96 8.78 8.96 8.78 
Put (K —S(T))* 3.97 4.08 3.90 4.08 3.90 
DOC 1{S(t)>L}-(S(T)—K)* 7.98 8.14 8.05 8.11 8.02 
DOP 1{S(t)>L}-(K-—S(T))* 0.26 0.27 0.28 0.25 0.27 
UOC 1{S(t)< A}-(S(T)—K)* 0.99 0.88 0.94 0.83 0.89 
UOP 1{S(t)<H}-(K-—S(T))* 3.81 3.90 3.75 3.89 3.74 
KOC 1{S(t) €[L,H]}-(S(T)—K)* 0.65 0.56 0.64 0.52 0.59 
KOP 1{S(t)€[L,H]}-(K—S(T))* 0.20 0.20 0.22 0.19 0.21 
BCC 1{S(T) > K} 0.58 0.56 0.57 0.56 0.57 
BCP 1{S(T) < K} 0.37 0.39 0.38 0.39 0.38 


57We assume that S (t) € [0, 200]. 
58We use the following Dirichlet and Neumann conditions: 


V (t,57)=0 V(t St)=0  ôsV(t,957)=-1 dsV (t,9+)=0 
Call, BCC Put, BCP Put, BCP Cal, BCG 
DOC, DOP, UOC DOP, UOC, UOP UOP DOC 


KOC, KOP 


KOC, KOP 


where S~ = 0 and St+ = 200. 


595; = 16% and X2 = 15.5% correspond to the two implied volatilities of strikes 96 and 104 for the 


one-year maturity. 
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9.2.4 Stochastic volatility models 


The most popular approach to model the volatility smile is to consider that the volatility 
is not constant, but stochastic. In this case, we obtain a model with two state variables, 
which are the spot price S(t) and the volatility ø (t). After deriving the general formula of 
the fundamental pricing equation, we present Heston and SABR models, which are the two 
most important parametrizations of this class of models. 


9.2.4.1 General analysis 


Pricing formula We assume that the joint dynamics of the spot price S(t) and the 
stochastic volatility ø (t) is: 


ae u(t) S(t) dt +a (t) S (t) dW, (t) 
(t) = C (a (t)) dt + (a (t)) dW2 (t) 


where E[W, (t) W2 (t)] = pt. S(t) is a geometric Brownian motion with time-varying pa- 
rameters u (t) and o (t), whereas ø (t) follows a general diffusion that does not depend on 
S(t). In the Black-Scholes model, the volatility has the status of parameter. In this new 
approach, the volatility is a second state variable. The SV model is defined by the functions 
C (y) and € (y). 

Using Itô’s lemma, we can show that the fundamental pricing equation defined on page 
492 becomes”: 


1 
57 905V (t, 5,0) + poS€ (0) 03,,V (t, 5,0) + F (a) ÈV (t, 9,0) 
+ (u — Aso) SOsV (t, S,0) + (C (0) — Av§ (0)) 3V (t, 9,0) 
+; V -< rV (t,S,0) = 0 
where V (t, S,a) is the price of the contingent claim, V (T, S (T)) = f (S (T)) and f (S (T)) 
is the option payoff. As previously, the market price of the spot risk Wj (t) is 


H (t) — b(t) 


Às (t) = o (t) 


By introduction the function ¢’ (y): 


we obtain the following PDE: 


50°S°0RV (t, 8,0) + poS€ (a) 03 V (t, 8,0) + le (0) &V (t, 3,0) 
+b88sV (t,S,0)+ C (a) ôV (t, 5,0) + aV (t, S,o)— rV (t,S,o) = 0 
(9.35) 


Equation (9.35) is the equivalent of Equation (9.2) on page 492 when the volatility is 
stochastic. 


Using the Girsanov theorem, we deduce that the risk-neutral dynamics is: 


e b(t) S (t) dt + o (t) S(t) aAWẸ (t) 
da (t) = ¢' (a (t)) dt + £ (o (t)) dW? (t) 


60We omit the dependence in t in order to simplify the notation. 
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The martingale solution is then equal to: 


Vo =E® |e fi" y(s (ry) F| 


We retrieve the formula obtained in the one-dimensional case. However, the computation 
of the expected value is now more complex since S(T) depends on the trajectory of the 
volatility ø (t). 


Hedging portfolio The computation of greek coefficients is more complex in SV models. 
This is why the definition of the hedging portfolio is not straightforward and depends on 
the assumption on the smile dynamics. In the case of the Black-Scholes model, delta and 
vega sensitivities are equal to: 


O Vgs (So, K, x, T) 
ô So 


Ags = 


and: 
ae Ə Vgs (So, K,£,T) 


ox 
In the case of the stochastic volatility model, we have: 


UBS 


= ô Vsv (So, K, To, T) 
0 So 


If we assume that Vey (So, K, oo, T) = Ves (S0, K, £sv (T, So), T), we obtain: 


Asv 


ô Ves (So, K, &sv, T) | 8 Ves (So, K, %sv, T) 3 Zsv (T, So) 


A = H 
a ô So OXsv ô So 


0 sv (T, So) 


= Aps+ vps: a8 
0 


Therefore, the delta of the SV model depends on the BS vega. Generally, we have 
Os, Usv (T, So) = 0 implying that Asv > Ags. 

The calculation of the vega coefficient is a second issue. Indeed, the natural hedging 
portfolio should consist in two long/short exposures since we have two risk factors S (t) and 
o (t). Therefore, we can define the vega sensitivity as follows: 


= ô Vv (So, K, To, T) 
7 000 


vsv 
However, this definition has no interest since the stochastic volatility ø (t) cannot be directly 
or even indirectly trade. This is why most of traders prefer to use a BS vega: 


O Vgs (So, K, Usv (T, So) , T) 
ô Ysy 


Usv = 


Here, we make the assumption that the vega is calculated with respect to the implied 
volatility Ugy (T, So) deduced from the stochastic volatility model. It can be viewed as a 
pure Black-Scholes vega, but most of times, it corresponds to a shift of the implied volatility 
surface. This approach requires a new calibration of the stochastic volatility parameters. In 
some sense, the vega can be viewed as the difference between the prices obtained with two 
stochastic volatility models. 
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9.2.4.2 Heston model 


Heston (1993) assumes that the stochastic differential equation of the spot price is equal 


to: 
{ ds (t) = AT t)S (t a 
dv (t) = ) dt ~ sated 
where $(0) = So, v(0) = vo and a = “a (t) a (t)) is a two-dimensional Wiener 
process with E[W, (t) W2 (t)] = pt. We notice that the stochastic variance v (t) follows a 
CIR process: 0 is the long-run variance, k is the mean-reverting parameter and € is the 
volatility of the variance (also called the vovol parameter). 


Remark 106 We have o (t) = y/v (t) and: 


ary =((F E) zna (D) at + 56 dra 0) 


The stochastic volatility is then an Ornstein- Uhlenbeck process if we impose 0 = £? / (4k). 


As the second state variable of the Heston model is the stochastic variance v (t), the 
price V (t, S,v) of the option must satisfy the PDE®!: 
1 1 
50S? OSV + pEvSd%,V + zE VV 
+bSOsV + (K(8—v(t)) —AvV)OA,V+OV—-—rvVv = 0 
It follows that the risk-neutral dynamics is: 
{ dS (t) = bS (t) dt + yv o t) dwWe ( 
H a a a TN 


In the case of European call and put options, Heston (1993) gives a closed-form solution of 
the price: 


Co = Soe -T P, = Ke? P, 
Po = Soe? (P; —1)— Ke-™? (P, -1) 


where the probabilities P, and P satisfy: 


— 1 1 i e mK p; (So, vo, T, 9) 
Pj (So, vo, T, o) = exp (Cj (T, o) T Dj (T, o) Vo at iọln So) 


j andar 
Cy (T, 4) = bor + $ (05 ~ into +d) T 2n (E )) 


E = 95 
D; (T, ġ) = g2 1 — gjett 
by — ipl + dj 


AEE 
= V (iped — bj)? — € (2iujd — 6?) 


where a) = a2 = KO, bi = K + AÀ -— pE, b2 = K + À, Wy = 1/2 and ug = —1/2. 


61 Heston (1993) makes the assumption that Av (t) « Vv. 
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The existence of these semi-analytical formulas for European options is one of the main 
factors for explaining the popularity of the Heston model. However, the implementation of 
the formulas is not straightforward since it requires computing the integral of the inverse 
Fourier transform. In particular, Kahl and Jackel (2005) show that the evaluation of loga- 
rithms with complex arguments may produce a numerical instability. Numerical softwares 
will generally do the following computation: 


1— g; d;T 
In (= ) =In|r| + ip 
l-—gj 
where: 
| 1- gjett 
y — 
1—g; 
and: ET 
1 — g;e% 
Y = arg (= ) 
1—g; 


However, the fact that y € [—7, 7] will create a discontinuity when integrating the function. 
In order to circumvent this problem, we note: 
gj = r (gj) et?) 
and: 
dj = a (dj) + ib (dj) 
Kahl and Jäckel (2005) deduce that: 


r (gj) 2l) 1 
fetl Pit2nrm) 


gj—1 


where ™m = [n (p (93) + nm), ~; = arg (gj — 1) and f = |g; — 1|. They also found that: 
gje®T -1 = r(g;) et? (9i) ea (dj)T+ib(d;)T _ 1 

= r(g;) e2(di)T cile(gj)+b(ds)T) _ 4 

peil Itr) 


where m = [2mm (o (gj) + 6(d;) T+ r), pj; = arg (gje®T — 1) and ř = |gje®T —1]|. 
Finally, they obtain: 


1 =, . d;T 
In ( Ise 
1— gj 
In Figure 9.32, we show the functions fı (u) and f2 (u) defined by: 


—iulnk,,. if 
fi (u) =Re(£ #3 (So, Yo, 2) 


) = m +i (gy = 95 + 2mm — ani 
7 


u 


The parameters are Sp = 100, K = 100, T = 30, b = 0.00, vo = 0.2, k = 1, 0 = 0.2, £ = 0.5 
and A = 0. For fı (u), we use p = 30% whereas p is set to —30% for the function fə (u). We 
see the discontinuity produced by numerical softwares. The Kahl-Jackel method produces 
continuous functions without jumps. The problem can sometimes affect the two functions 
fi (u) and fo (u). This is the case in Figure 9.33 with the following parameters So = 100, 
K = 100, T = 30, b = 0.05, vo = 4%, k = 0.5, 0 = 4%, € = 0.7, p = —0.80 and à = 0. 
Again, the Kahl-Jackel method performs the good correction. 
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FIGURE 9.32: Functions fı (u) and fo (u) (k = 1) 
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FIGURE 9.34: Implied volatility of the Heston model (in %) 


Example 94 The parameters are equal to So = 100, b = r = 5%, vo = 0 = 4%, k = 0.5, 
€=0.9 and X= 0. We consider the pricing of the European call option, whose maturity is 
three months. 


Figure 9.34 shows the implied volatility for different values of the strike K and the 
correlation p. We notice that the Heston model can produce different shapes of the volatility 
surface. In Figure 9.35, we have reported the skew of the implied volatility defined by: 


ad (T, K) 
aK 


Several authors have proposed approximations of the Heston implied volatility 4; (T, K). 
We can cite Schénbucher (1999), Forde and Jacquier (2009), and Gatheral and Jacquier 
(2011). A more general approach has been proposed by Durrleman (2010), who assumes 
that the dynamics of S; is Markovian with: 


Oe (fo awe TG (s) as) 


do? (t) = u (t) dt — 20 (t) (a(t) dW (t) 
dy (t) = (-) dt + w (t) dW (t) + (-) dW (t) 
da (t) = m (t) dt + u(t) dW (t) + @ 
da (t) = (-) dt + v (t) dW (t) + (-) dW (t) 
du (t) = (-) dt + z (t) dW (t) + (-) dW (t) 
where (-) is a generic symbol for a continuous adapted process. Durrleman (2010) shows 
that: 


w(T, K) = 


and: 


X; (T, K) = o?” (t) + a (t) s (t) 4 
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FIGURE 9.35: Skew of the Heston model (in bps) 


where T = T — t and s(t) = ln S(t) — In K. The coefficients b(t), c(t), d(t) and e (t) are 
given by: 


and: 


d(t) = 2m(t) w(t) x(t) aue) aHa) awe, 
— 3 30 (t) 2 302 (t) — G6o(t) © olt) | 
2a(t)a?(t)  2u(t)o(t) a? (t) 
302 (t) 3 3 
ee a(t) | 2a(t)u(t) 3a(t)a(t) @(t)v(t) 3a? (t) 
~ Qa? (t) a(t) 203 (t o3(t) ` 204(t) 
4a (t) a? (t) 
a* (t) 


In the case of the Heston model, we have: 


dS (t) =a (t) S(t) dW (t) 
do? (t) = «(0 — o? (t)) dt + £0 (t) (paw, + /1-p? aw) 
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It follows that a(t) = —S2, a(t) = a ee and w(t) = m (t) = u(t) = a(t) = v (t) 


ye) 


x (t) = 0. We deduce that: 


In Figure 9.36, we have generated the volatility surface using the Durrleman formula of 
the Heston model approximation. The parameters are S(t) = 100, o (t) = 20%, k = 0.5, 
0 = 4% and € = 0.2. We consider different values for the correlation parameter p and the 
maturity T. We notice that the Durrleman formula does not fit correctly the Heston smile 
when the absolute value |p| of the correlation is high. 


Strike K Strike K 


FIGURE 9.36: Implied volatility of the Durrleman formula (in %) 


Example 95 We assume that S (t) = 100 and T = 0.5. The volatility smile is given by the 
following values: 


K 90.00 95.00 100.00 105.00 110.00 
X, (T,K) (in %) 20.25 19.92 19.67 19.49 19.38 
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FIGURE 9.37: Calibration of the smile by the Heston model and the Durrleman formula 


The calibration of the smile gives the following result®: 


Model a (t) K 0 43 p 
Heston 0.201 0.980 0.040 0.192 —0.207 
Durrleman 0.222 1.000 0.014 0.191 —0.193 


The volatility surface of each calibrated model is represented in Figure 9.37. The results are 
very similar. 


Remark 107 The Heston model was very popular in the 2000s. Nevertheless, even if we 
have an analytical formula for the call and put prices, the absence of a true implied volatility 
formula was an obstacle of its development, and the use of the Heston model is today less 
frequent. The Heston model has then been replaced by the SABR model, because of the 
availability of an implied volatility formula. 


9.2.4.3 SABR model 


Hagan et al. (2002) suggest using the SABR? model to take into account the smile 
effect. The dynamics of the forward rate F (t) is given by: 


{ dF (t) = a(t) F(t)? AWL (t) 
da (t) = va (t) dW8 (t) 


where E [we (t) We (0) = pt. Since 8 € [0,1], a(t) is not necessarily the instantaneous 
volatility of F (t) except in the cases 8 = 0 (Gaussian volatility) and 6 = 1 (log-normal 


62 Tt consists of minimizing the sum of squared errors between observed implied volatilities and theoretical 
implied volatilities deduced from the option model. 
63This is the acronym of stochastic — a — B — p. 
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volatility). The model has also 4 parameters: a the current value of a(t), 8 the exponent 
of the forward rate, v the log-normal volatility of a(t) and p the correlation between the 
two Brownian motions. One of the big interests of the SABR model is that we have an 
approximate formula of the implied Black volatility: 


a 


zZ 
(PK)? (1+ (=A)? In? Fo 4 G8)" py m) (xia) 


(1— 8)" a? pow 2-3% ə 
1 T 
( T (fo. 4 (Fok) IA 24 | 


Fe 
where z = va™! (FyK)"~9)/? In = and x (z) = In ( 1-—2pz+22+2 p) ln (1 — p). 


£g (T,K) = 


Let us see the interpretation of the parameters®*. We have represented their impact in 
Figures® 9.38 and 9.39. The parameter 6 allows to define a stochastic log-normal model 
when £ is equal to 1, or a stochastic normal model when £ is equal to 0, or an hybrid model. 
The choice of 2 is generally exogenous. The main reason is that 8 is highly related to the 
dynamics of the ATM implied volatility. If 6 is equal to 1, we observe a simple translation 
of the smile when the forward rate moves (first panel in Figure 9.38). If 8 is equal to 0, the 
ATM implied volatility decreases when the forward rates increases (second panel in Figure 
9.38). This explains the behavior of the backbone, which represents the dynamics of the 
ATM implied volatility when the forward rate varies (third panel in Figure 9.38). 


£(T.K) (in %) 
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FIGURE 9.38: Impact of the parameter 3 


64In the following examples, we consider a one-year option, whose current forward rate Fo is equal to 5%. 
65The default values are a = 10%, 8 = 1, v = 50% and p = 0. 
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FIGURE 9.39: Impact of the parameters a, v and p 


The parameter a controls the level of implied volatilities (see Panel 1 in Figure 9.39). 
In particular, a is close to the value of the ATM volatility when £8 is equal to one®. v is 
called the vovol (or vol-vol) parameter, because it measures the volatility of the volatility. v 
impacts then the stochastic property of the volatility a (t). The limit case v = 0 corresponds 
to the constant volatility and we obtain the classical Black model®’. An increase of v tends 
to increase the slope of the implied volatility (see Panel 2 in Figure 9.39). The asymmetry 
of the smile is due to the parameter p. For instance, if p is negative, the skew is more 
important in the left side than in the right side (see Panel 3 in Figure 9.39). 


Remark 108 The parameters B and p impact the slope of the smile in a similar way. 
Then, they cannot be jointly identifiable. For example, let us consider the following smile 
when Fo is equal to 5%: Up (1,3%) = 13%, Eps (1,4%) = 10%, Ue (1,5%) = 9% and 
Sp (1,7%) = 10%. If we calibrate this smile for different values of B, we obtain the following 
solutions: 

p| a v p 

0.0 | 0.0044 0.3203 0.2106 

0.5 | 0.0197 0.3244 0.0248 

1.0 | 0.0878 0.3388 —0.1552 


We have represented the corresponding smiles in Figure 9.40 and we verify that the three 
sets of calibrated parameters give the same smile. 


pav  2— 3p? 2 
Sep (T, Fo) = 14 t T 
B(T, o) =a ( ( 4 a 


It follows that £p (T, Fo) is exactly equal to a when p is equal to zero. 
67When £ is equal to one of course. 


66In this case, we have: 


X(T,K) (in 2) 
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K (in 7) 


FIGURE 9.40: Implied volatility for different parameter sets (8, p) 


571 


We have seen that the choice of 8 is not important for calibrating the SABR model for 
a given maturity. We have already seen that the parameter 8 has a great impact on the 


dynamics on the backbone. Therefore, there are two approaches for estimating (3: 


1. 6 can be chosen from prior beliefs (8 = 0 for the normal model, 3 = 0.5 for the CIR 
model and 8 = 1 for the log-normal model); 


2. 6 can be statistically estimated by considering the dynamics of the forward rate. 


TABLE 9.12: Calibration of the parameter 2 in the SABR model 


Rate Level Difference Empirical quantile of ae 
B R2 B Re 10% 25% 50% 75% 90% 
lyly | —0.06 0.91) 0.59 0.15 | —2.01 —0.14 0.71 1.00 2.17 
ly5y | —0.29 0.87] 0.382 0.27 | —1.80 -—0.28 0.73 1.11 2.76 
lyl0y | —0.37 0.80 | 0.34 0.22 | —2.04 —0.23 0.71 1.11 2.69 
5yly 0.42 0.29 | 0.35 0.22 | —1.58 —0.31 0.71 1.00 2.38 
5y5y | —0.01 0.73 | 0.23 0.28 | —2.12 —0.36 0.61 1.00 2.52 
5syl0y | —0.10 0.69 | 0.27 0.23 | —1.99 —0.30 0.70 1.05 2.58 
10yly 0.96 0.00 | 0.28 0.20 | —1.88 —0.20 0.80 1.07 2.43 
10y5y | —0.10 0.65 | 0.28 0.20 | —2.02 —0.29 0.73 1.02 2.76 
10y10y | —0.47 0.73 | 0.27 0.20 | —1.71 —0.24 0.85 1.07 2.98 


The second approach is based on the approximation of the ATM volatility: 


We have: 


dt (T; Fi) = 


F°? 


Ind; (T, Fr) =Ina+ (8— 1) In F; + ut 


(9.36) 
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We can then estimate 6 by considering the linear regression of the logarithm of the ATM 
volatility on the logarithm of the forward rate. However, these two variables are generally 
integrated of order one or I (1). A better approach is then to consider the alternative linear 
regression’: 


ln Xiph (T, Fisn) = ln yy (T, F) = c+ (8 = 1) (In Fish = ln F;) + Ut (9.37) 


where c is a constant. In this case, the linear regression is performed using the difference 
and not the level of implied volatilities. Using the Libor EUR rates between 2000 and 2003, 
we obtain results given in Table 9.12. In the first column, we indicate the maturity and the 
tenor of the forward rate. The next two columns report the estimate B and the R-squared 
coefficient R2 for the regression model (9.36). Then, we have the values of Ê and R? for the 
regression model®°? (9.37). We observe some strong differences between the two approaches 
(see also the probability density function of Ê in Figure 9.41). These results show that the 
regression model (9.36) produces bad results. However, it does not mean that the second 
regression model (9.36) is more robust. Indeed, we can calculate the exact value Pian that 
explains the dynamics of the ATM volatility from time t to time t + h: 


a = In (Fin -Bith (T, Fisn)) — ln (Fi dy (T, F;)) 
Bet+n = 
In Fish — ln F 


In Table 9.12, we notice the wide dispersion of Bien: On average, the parameter ( is around 
70%, but it can also take some large negative or positive values. This is why £ is generally 
chosen from prior beliefs. 


Once we have set the value of 6, we estimate the parameters (a,v,p) by fitting the 
observed implied volatilities. However, we have seen that a is highly related to the ATM 
volatility. Indeed, we have: 


2 
(Sg a 258 
0 


AFP ape 


We deduce that: 


1-277 v BT 2—30 _ 
i (oe aed (25) ee (: | ve YT ) — Es (T, Fo) Fo” = 
0 0 


Let a = ga (Up (T, Fo) , v, p) be the positive root of the cubic equation. Therefore, imposing 
that the smile passes through the ATM volatility £ g (T, Fo) allows to reduce the calibration 
to two parameters (v, p). 


Example 96 We consider the following smile: 


K (in %) 28 30 35 37 40 45 50 7.0 
X(T, K) (in %) 13.2 128 12.0 11.6 11.0 10.0 9.0 10.0 


The maturity T is equal to one year and the forward rate Fo is set to 5%. 


68 We have: 


Leth (T, Feta) -on 
Z (T, V 


69Tn this case, we set h to one trading day. 
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FIGURE 9.41: Probability density function of the estimate 6 (SABR model) 


If we consider a stochastic log-normal model (8 = 1), we obtain the following results: 


Calibration a (in%) £8 v plin% RSS arm (in %) 
#1 9.466 1.00 0.279 —23.70 0.630 9.51 
#2 8.944 1.00 0.322 —22.90 1.222 9.00 


RSS indicates the residual sum of squares (expressed in bps). In the first calibration, we 
estimate the three parameters a, v and p. In this case, the residual sum of squares is equal to 
0.63 bps, but the SABR ATM volatility is equal to 9.51%, which is far from the market ATM 
volatility. In the second calibration, we estimate the two parameters v and p, whereas a is 
the solution of the cubic equation that fits the ATM volatility. We notice that the residual 
sum of squares has increased from 0.63 bps to 1.222 bps, but the SABR ATM volatility 
is exactly equal to the market ATM volatility. The two calibrated smiles are reported in 
Figure 9.42. 


Remark 109 One of the issues with implied volatility calibration is that we generally have 
more market prices for the put (or left) wing of the smile than its call (or right) wing. This 
implies that the put wing is better calibrated than the call wing, and we may observe a large 
difference between the calibrated ATM volatility and the market ATM volatility. Therefore, 
professionals prefer the second calibration. 


The sensitivities correspond to the following formulas”: 


_ OCg Cr Əsr (T, K) 
~ OF a= a Fo 


TOIf we consider the parametrization a = ga (atm; 1, p), we have: 


_ Cs OCB (28 (T,K) , OB (T, K) ga T 
Ə Fo i ða ð Fo 


af | Os 
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FIGURE 9.42: Calibration of the SABR model 


and: 
_ Cg OXB(T,K) 


— as ða 
To obtain these formulas, we apply the chain rule on the Black formula by assuming that 
the volatility © is not constant and depends on Fo and a. 


Remark 110 We notice that the vega is defined with respect to the parameter a. This 
approach is little used in practice, because it is difficult to hedge this model parameter. This 
is why traders prefer to compute the vega with respect to the ATM volatility: 


_ OCg OXzB(T,K) ða 
~ 8d ða O Z ATM 


where XATM = Xp (T, Fo). 


Remark 111 Bartlett (2006) proposes a refinement for computing the delta. Indeed, a shift 
in Fo produces a shift in a, because the two processes F (t) and a(t) are correlated. Since 
we have: 


da (t) va (t) WS (t) 


= va(t) (paw? (t) + v1- aW (1) 


and: 
Ga OF 
= eer 


we deduce that: 
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The new delta is then: 


A* _ OCz Cpg o&p (T, K) OUR (T, K) ða 
= OF Os ð Fo ða ô Fo 
_ OCR OCR (ô¥£r(T,K) vpe Ə¥r(T,K) 
— ôF ox O Fo F(t)? Oa 
yp 
= v 
F(t)" 


Therefore, this approach is particularly useful when we consider a delta hedging instead of 
a delta-vega hedging, since the new delta risk incorporates a part of the vega risk. 


9.2.5 Factor models 


Factor models are extensively used for modeling fixed income derivatives (Vasicek, CIR, 
HJM, etc.). They assume that interest rates are linked to some factors X (t), which can be 
observable or not observable. For instance, the factor is directly the instantaneous interest 
rate r(t) in Vasicek or CIR models. However, a one-factor model is generally limited and 
is not enough rich to fit the yield curve and the basic asset prices (caplets and swaptions). 
During a long time, academics have developed multi-factor models by considering explicit 
factors (level, slope, convexity, etc.). For instance, Brennan and Schwartz (1979) consider 
the short-term interest rate and the long-term interest rate, whereas Longstaff and Schwartz 
(1992) use the short-term interest rate and its volatility. Today, this type of approach is 
outdated and is replaced by a more pragmatic approach based on non-explicit factors. 


9.2.5.1 Linear and quadratic Gaussian models 
Let us assume that the instantaneous interest rate r(t) is linked to the factors X (t) 
under the risk-neutral probability Q as follows: 
r(t)=a(t)+B8()' XH4+XH' THX 


where a(t) isa scalar, 3 (t) isa nx 1 vector and T (t) isa nxn matrix. This parametrization 
encompasses different specific cases: one-factor model, affine model and quadratic model”. 
We also assume that the factors follow an Ornstein-Uhlenbeck process: 


dX (t) = (a(t) + B (t) X (t)) dt + È (t) dW® (t) 


where a(t) is a n x 1 vector, B (t) is a n x n matrix, X (t) is a n x n matrix and W® (t) is 
a standard n-dimensional Brownian motion. 


El Karoui et al. (1992a) show that there exists a family of â (t, T), Ê (t, T) and Î (t, T) 
such that the price of the zero-coupon bond B (t, T) is given by: 


B (t, T) = exp (-a (t,T) —B(t,T)' X(t) -X()' ÊE,T)X (1)) 


71 As shown by Filipović (2002), it is not necessary to use higher order because the only consistent 
polynomial term structure approaches are the affine and quadratic term structure models. 
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where â (t, T), Ê (t, T) and Î (t, T) solve a system of Riccati equations. If we assume that 


the matrix Î (t, T) is symmetric, we obtain:”?: 
Bât, T) = —tr (= OPIO (1,7) — ÊT)" a(t) + 
SALT) BE" BET) - alt 
BÊ (T) = —B(t)' ÊT) + ETE AE ÊT) - 
2Î (t, T) a(t) — b (t) 
OE) = eT EHEH Êt, T) 
at (t, T) B (t) -T (t) 


with the boundary conditions â(T,T) = 6(T,T) = Î(T,T) = 0. We notice that the 
expression of the forward interest rate F (t, T1, T2) is given by: 
1 B (t, Ta) 
n 
hot B(T) 
7 7 T 
â (t Ta) -â 6 T) + (ÂE T) -êG T)) XO 
Tə- Tı 
XA (ĈET) -ÎET)) XO 
Tə- Tı 


F (t, Tı, Tə) 


We deduce that the instantaneous forward rate is equal to: 
f(T) =a(t,T) +6 (,T)' X (t) + X (t)' T (tT) X (t) 


where a(t, T) = ôrâ (t, T), 8 (t, T) = ôrĝ (t, T) and T (t, T) = OrT (t, T). It follows that 
a (t) = a (t,t) = a(t, t), B (t) = 6 (t,t) = dÊ (t,t) and T (t) =T (t,t) = a (t,t). 

Let V (t, X) be the price of the option, whose payoff is f (x). It satisfies the following 
PDE: 


5 trace (5 ORV (t,X)B()") + (a(t) + BW) X) xV (t, X) + 


AV (t, X) — (a +L X+XT (1) X) VEX) = 0 
(9.38) 


Once we have specified the functions a (t), 6 (t), T (t), a (t), B (t) and È (t), we can then price 
the option by solving numerically the previous multidimensional PDE with the terminal 
condition V (T, X) = f (X). Most of the time, the payoff is not specified with respect to the 
state variables X, but depends on the interest rate r (t). In this case, we use the following 
transformation: 


f(r) =f (a(T)+B(D)' X+ XT (T) x) 


Remark 112 We can also calculate the price of the option by Monte Carlo methods. This 
approach is generally more efficient when the number of factors is larger than 2. 


72See Exercise 9.4.10 on page 601 and Ahn et al. (2002) for the derivation of the Riccati equations. 
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9.2.5.2 Dynamics of risk factors under the forward probability measure 


We have: 
dB (t,T) 
Btt,T) 
We deduce that: 


we (t) = W (t) + [> (s)! (2f (s, T) X (s) + Ê (s, T)) ds 


M M T 
= r (t) dt- (2f (t,T) X(t) +Ê (t,7)) E(t) dw (t) 


defines a Brownian motion under Q* (T). It follows that: 
dX (t) = (a(t) + B(t) X (t)) dt + E (t) wWe™ (4) 
where: P 
a(t) =at) -EEEE BT) 


and: 
B(t) = B(t) —25(t)N(t)' P(t, 7) 


We conclude that X (t) is Gaussian under any forward probability measure Q* (T): 
X (t) ~N (m(0,t) , V (0, t)) 


El Karoui et al. (1992a) show that the conditional mean and variance satisfies the following 
forward differential equations: 


Orpm(t,T) = a(T)+B(T)m(t,T) —2V (t,T)T(T)m(t,T) - 
V (t,T) B(T) 
OrpV(t,T) = Vit,T)B(T)' +B(T)V (tT) -— 2V (t, T)I (T) V (tT) + 
SET" 
If t is equal to zero, the initial conditions are m (0,0) = X (0) = 0 and V (0,0) = 0. If t 40, 
we proceed in two steps: first, we calculate numerically the solutions m (0, t) an ma t), 


and second, we initialize the system with m (t,t) = m (0,t) and V (t,t) = V (0, 


Remark 113 In fact, the previous forward differential equations are not obtained under the 
traditional forward probability measure Q* (T), but under the probability measure Q* (t, T) 
defined by the following Radon-Nykodin derivative: 
* $: p 
dQ (t, T) -e f r(s)ds f, f(t,s) ds 
dP 


The reason is that we would like to price at time t any caplet with maturity T. Therefore, 
this is the maturity T and not the filtration F, that moves. 


9.2.5.3 Pricing caplets and swaptions 
We reiterate that the formula of the Libor rate L (t, T;—1, T;) at time t between the dates 
T;—ı and T; is: 
1 Bt, T;— 
L(t, Ti-1,Ti) = ( (67-1) i) 


T-Ti-1\ Bw,Ti) 
It follows that the price of the caplet is given by: 


Caplet = B (0,t) E2 © [B (é,T 4) — (1 + (T; — T;-1) K) B ET] 


where Q* (t) is the forward probability measure. We can then calculate the price using two 
approaches: 
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1. we can solve the partial differential equation; 
2. we can calculate the mathematical expectation using numerical integration. 


In the first approach, we consider the PDE (9.38) with the following payoff: 
f (X) = max (0, g (X)) 
where: 
g(X) = exp (-â (t, Ti-1) - B(, Ta)” X - XRT) X) - 
(1+ 51K) exp (-8 ET) - ÔT)" X - XTT) X) 


In the second approach, we have X (t) ~ N (m (0, t), V (0, t)) under the forward probability 
Q* (t). We deduce that: 


Caplet (t, T;-1, T;) = B (0, t) fte ) bn (x; m (0,t), V (0,t)) dz 


This integral can be computed numerically using Gauss-Legendre quadrature methods. 


For the swaption, the payoff is: 


f(X) = (Sw()- DFES T;—1) B (To, T;) 
i=l ` 
= (200.29) - B (Tọ, Tn jea e T;—1) BT) 
= max(0,g(X)) 
where: 
g(X) = exp (—4 (To, To) - Ê (To, To)” X - XTE (To, To) X) - 


exp (—4 (To, Tn) — Ê (To, Tn)” X — XTÎ (To, Tn) X) - 


K$ exp (—4 (To, Ti) — B (To T) X - XTP (To, T) X) 


As previously, we can price the swaption by solving the PDE with the payoff f (X) or by 
calculating the following integral: 


Swaption = B (0, To) J f (£) bn (x; m (0, To), V (0, To)) da 


9.2.5.4 Calibration and practice of factor models 


), T (t), a(t), 


The calibration of the model consists in fitting the functions a(t), 8 (t 
= 0. Indeed, if we 


B (t) and X(t). Generally, professionals assume that a(t) = 0 and B (t) 
consider the following transformation: 


t t s 
oech xe Ge i ae kanig 
0 
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we obtain: 


aX) = e h POLIH awe) 
È (t) dW (t) 


II 


Without loss of generality, we can then set dX (t) = © (t) dW® (t), and the Riccati equations 
are simplified as follows: 


0,4 (t, T) = —tr (= u(t) h(t, T)) +18 (t,7T)" S(t) Z(t)" ÊT) — a (lt) 
0,8 (t, T) = Ô (¢,T)' E(t) Be)! (tT) — Be) 
BÊ (t, T) = (t, T) EH E(t)! Î (t, T) -T (t) 


If we consider an affine model, we retrieve the formula of Duffie and Huang (1996): 
B (t, T) = exp (â (6, T) -Â (ET) X (0) 


where”3: 
{ â (t, T) = ZÊ (t,T)' EHEH)’ ÊT) -a (t) 
ob (t, T) e (t) 


First, we must fit the initial yield curve, which is noted B (0,T). If we assume that 
X (0) = 0, we obtain: 
B (0, T) 
B (0,¢t) 


We notice that the computation of â (t, T) allows to define a (t): 


a&(t,T) =—In 


a(t) =—t (IOA HET) + iô (t,T)" EHEH Â E,T)- kâ (t, T) 


because â (t, T) can be calculated using finite differences. Therefore, the problem dimen- 
sion is reduced and the calibration depends on £ (t), T (t) and È (t). In order to calibrate 
these functions, we need to fit other products like caplets and swaptions. We have shown 
that these products can be priced using numerical integration. Therefore, the calibration of 
b(t), T (t) and È (t) can be done without solving the PDE, which is time-consuming. 


Let us now see what type of volatility smile is generated by quadratic and linear Gaussian 
factor models. We assume that the functions £ (t), T (t) and È (t) are piecewise constant 
functions, whose knots are t{ = 0.5 and t3 = 0.5. For instance, the function £ (t) is given by: 


Bı ifte [0,0.5| 
B(t)=*% B ifte [05,1 
Bs if t € [1,00) 


where (61, 82 and 83 are three scalars. Therefore, 8 (t) is defined by the vector (81, 82, 83). 
In a similar way, I (t) and È (t) are defined by the vectors (T1,T2,P3) and (51, 42, U3). We 


73In the general case a(t) 4 0 and B (t) 4 0, we have: 


Dâ (t, T) = —B(t,T)' a(t) + $Ê (t, T)" EEEE)" BET) - a(t) 
B(t,T) =-B(t)' B(t,T) — £ (t) 
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FIGURE 9.43: Volatility smiles generated by the quadratic Gaussian model 


consider 4 parameter sets’: 


Set (81, b2, 83) (T1, 2,13) (£1, U2, U3) 


#1 (0.3,0.4,0.5)  (—20,—10,10)  (8,3.2,3.5) 
#2 (0.3,0.4,0.5) (20,15,10) (3, 3.2, 3.5) 
#3  (0.3,0.4,0.5) (5,5,5) (4, 3.5, 3) 
#4 (0.3,0.4,0.5) (—10,-10,-10) (6,5,4) 


We also assume that the yield curve is flat and is equal to 5%. We consider the pricing of a 
caplet with Tp = T; — 2/365, Ti = 0.5 and Tz = 1.5 for different strikes K; = K} - Sw (To) 
where K} € [0.8, 1.2]. In Figure 9.43, we have reported the implied Black volatilities (in %) 
generated by the quadratic Gaussian model with the four parameter sets. We notice that 
the quadratic Gaussian model can generate different forms of volatility smiles. Since it is 
a little more flexible than the linear Gaussian model, we can obtained U-shaped and even 
reverse U-shaped volatility smiles. 


9.3 Other model risk topics 


In this section, we consider other risks than the volatility risk. In particular, we study 
the impact of dividends on option premia, the pricing of basket options and the liquidity 
risk. 


T4The volatilities (%1, 2,3) are normalized by the factor V260 x 1074. 
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9.3.1 Dividend risk 
9.3.1.1 Understanding the impact of dividends on option prices 


Let us consider that the underlying asset pays a continuous dividend yield d during the 
life of the option. We have seen that the risk-neutral dynamics become: 


dS (t) = (r — d) S(t) dt + aS (t) dW (t) 
We deduce that the Black-Scholes formula is equal to: 
Co = Soe 7 ® (dı) = Ke"? ® (d2) 


where: 


LEA 1 
d = l d)T ~oVT 
i ap (mete ) ) +5 


də = dı — oVT 
We can also show that limg...Co = 0. In Figure 9.44, we report the price of the option 
when K = 100, o = 20%, r = 5% and T = 0.5. We consider different level of the dividend 


yield d. We notice that the call price is a decreasing function of the continuous dividend. If 
we consider put options instead of call options, the function becomes increasing. 


25 
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FIGURE 9.44: Impact of dividends on the call option price 


We generally explain the impact of dividends because stock prices generally fall by the 
amount of the dividend on the ex-dividend date. Let S (t) denote the value of the underlying 
asset at time t and D the discrete dividend paid at time tp. We have: 


S(tp) =$ (tp) -D 
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The impact on the payoff is not the unique effect. Indeed, we recall that the option price is 
the cost of the replication portfolio. When the trader hedges the call option, he has a long 
exposure on the asset since the delta is positive. This implies that he receives the dividend 
of the asset. Therefore, the hedging cost of the call option is reduced. In the case of a put 
option, the trader has a short exposure and has to pay the dividend. As a result, the hedging 
cost of the put option is increased. 


9.3.1.2 Models of discrete dividends 


We denote by S(t) the market price and Y (t) an additional process that is assumed to 
be a geometric Brownian motion: 


AY (t) = rY (t) dt + oY (t) dW® (t) 
Following Frishling (2002), there are three main approaches to take into account discrete 
dividends. In the first approach, Y (t) is the capital price process excluding the dividends 
and the market price S (t) is equal to the sum of the capital price and the discounted value 
of future dividends: 

SHASY A+ XO D(t) eM 

tkE[t,T] 

To price European options, we then replace the price Sọ by the adjusted price Yọ = 
So — Vcr D (te) e~™*. In the second approach, we define D (t) as the sum of capital- 
ized dividends paid until time t: 


D(t) = 5 1 {tk < t}: D (tk) er (t-te) 


The market price S(t) is equal to the difference between the cum-dividend price Y (t) and 
the capitalized dividends (Haug et al., 2003): 


We deduce that: 


($(T)-K)* = (Y(T)-D(T)-K)" 
= (Y(T)-(K+D(T))* 
= (Y(T)-K')" 


In the case of European options, we replace the strike K by the adjusted strike K’ = 
K+ B en D (te a T=, The last approach considers the market price process as a 
ko 

discontinuous process: 

dS (t) = rS (t) dt + 0S (t) AWL (t) if tk-1 < t< tk 

S(t) = S (t,) — D (tk) if t= tp 
Therefore, we calculate the option price using finite differences or Monte Carlo simulations. 
Remark 114 The three models can be used to price exotic options, and not only European 
options. Generally, we do not have closed-form formulas and we calculate the price with nu- 


merical methods. For that, we have to define the risk-neural dynamics of S (t). For instance, 
we have for the second model”: 


dS (t) = (rs (t) — S11 {ta = t} - D (te) oom) dt + o (S(t) + D(t)) dW? (t) 


7T5We notice that: 


aD () = (“D(H + 14e = 1) Dts) at 
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Example 97 We assume that So = 100, K = 100, o = 30%, T = 1, r = 5% and b = 5%. 
A dividend D (tı) will be paid at time tı = 0.5. 


Table 9.13 compares option prices when we use the three previous models. When D (t1) 
is equal to zero, the three models give the same price: the call option is equal to 14.23 
whereas the put option is equal to 9.35. When the asset pays a dividend, the three models 
give different option prices. For instance, if the dividend is equal to 3, the call option is equal 
to 12.46 for Model #1, 12.81 for Model #2 and 12.69 for Model #3. We notice that the 
three models produce very different option prices’. Therefore, the choice of the dividend 
model has a big impact on the pricing of derivatives. 


TABLE 9.13: Impact of the dividend on the option price 


i Call i Put 
D (tı)! (#1) (#2) (#3)! (#1) (#2) (#3) 
0 , 14.23 14.23 14.23, 9.35 9.35 9.35 
3 i 12.46 12.81 12.69 10.51 10.86 10.64 
[l l 
I I 


5 11.34 11.92 11.69 , 11.34 11.92 11.59 
10 8.78 9.93 9.42 ! 13.66 14.80 14.20 


Remark 115 The previous models assume that dividends are not random at the inception 
date of the option. In practice, only the first dividend can be known if it has been announced 
before the inception date. This implies that dividends are generally unknown. Some authors 
have proposed option models with stochastic dividends, but they are not used by professionals. 
Most of the time, they use a very basic model. For instance, the Gordon growth model 
assumes that dividends increase at a constant rate g: 


D (tk) = (1 + 9)" © D (t1) 


The parameter g can be calibrated in order to match the forward prices. 


9.3.2 Correlation risk 


Until now, we have studied the pricing and hedging of options that are based on one 
underlying asset. Banks have also developed derivatives with several underlying assets. In 
this case, the option price is sensitive to the covariance risk, which may be split between 
volatility risk and correlation risk. Here, we face two issues: the determination of implied 
correlations, and the hedging of the correlation risk. 


9.3.2.1 The two-asset case 


Pricing of basket options We consider the example of a basket option on two assets. 
Let S; (t) be the price process of asset 7 at time t. According to the Black-Scholes model, 
we have: 
{ dS (t) = b1 S1 (t) dt + 01S; (t) dW (t) 
aSo (t) = b2S2 (t) dt + 7252 (£) AWF (6) 
where b; and g; are the cost-of-carry and the volatility of asset i. Under the risk-neutral 
probability measure Q, W£ (t) and W (t) are two correlated Brownian motions: 


[wE (t) WP O] = pt 


T6We also notice that the price given by the third model is between the two prices calculated with the 
first and second models. 
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The option price associated to the payoff (a1 S1 (T) + a2S2 (T) — K)* is the solution of the 
two-dimensional PDE: 


1 1 
9719195, € + 9 729205,C + 710251 S20%, g,€+ 

bı S10s,C + b2S205,C +C-rC = 0 
with the terminal condition: 


C (T, S1, S2) = (a181 + agS2 — k)* 


Using the Feynman-Kac representation theorem, we have: 


Co = zQ e Sy a (a1 $1 (T) + agS2 (T) = K)” 


The value Co can be calculated using numerical integration techniques such as Gauss- 
Legendre or Gauss-Hermite quadrature methods. In some cases, the two-dimensional prob- 
lem can be reduced to one-dimensional integration. For instance, if aj < 0, a2 > 0 and 
K > 0, we obtain”: 


Co = | BS (S* (x) , K* (x) ,0*,T,b*,r) ġ (x) dz 
R 
where S* (£) = ayS2(0)e?YT?, K*(x) = K — œS (0) el17301) T tev Te ow = 
1 
d2\/1— p? and b* = bo — 502: 
Example 98 We assume that Sı (0) = S2 (0) = 100, 01 = o2 = 20%, bı = 10%, bo = 0 and 
r= 5%. We calculate the price of a basket option, whose maturity T is equal to one year. 


For the other characteristics (a,,Q2,K), we consider different set of parameters: (1,—1,1), 
(1,-1,5), (0.5,0.5, 100), (0.5,0.5,110) and (0.1, 0.1, —5). 


TABLE 9.14: Impact of the correlation on the basket option price 


a 1.0 1.0 0.5 0.5 0.1 
a2 —1.0 —1.0 0.5 0.5 0.1 
K 1 5 100 110 —5 


—0.90 | 20.41 18.23 5.39 0.66 24.78 
—0.75 | 19.81 17.62 6.06 1.35 24.78 
—0.50 | 18.76 16.55 6.97 2.31 24.78 
—0.25 | 17.61 15.37 7.73 3.12 24.78 

p 0.00 | 16.35 14.08 8.39 3.83 24.78 
0.25 | 14.94 12.61 8.99 4.46 24.78 

0.50 | 13.30 10.88 9.54 5.05 24.78 

0.75 | 11.29 8.66 10.05 5.59 24.78 

0.90 | 9.78 6.81 10.34 5.90 24.78 


Using Gauss-Legendre quadratures, we obtain the prices of the basket option given in 
Table 9.14. We notice that the price can be an increasing, decreasing or independent function 
of the correlation parameter p. 


TTSee Exercise 9.4.11 on page 602. 
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Remark 116 We can extend the previous framework to other payoff functions. The PDE 
is the same, only the terminal condition changes: 


C (T, S1, 82) = f (S1 (T) , 92 (T)) 
where f (Sı (T), S2(T)) is the payoff function. 


Cega sensitivity The correlation risk studies the impact of the parameter p on the option 
price Co. For instance, Rapuch and Roncalli (2004) show that the price of the spread option, 
whose payoff is (Sı (T) — S2 (T) — K)", is a decreasing function of the correlation param- 
eter p. They also extend this result to an arbitrary European payoff f (S1 (T), S2 (T)). In 
particular, they demonstrate that, if the cross-derivative OF » f is a negative (resp. positive) 
measure, then the option price is decreasing (resp. increasing) with respect to p. For in- 
stance, the payoff function of the call option on the maximum of two assets is defined as 
f (S1, 52) = (max ($1, $2) — K)*. Since 6? 5 f (51,52) = —1 {51 = S2, $1 > K} is a nega- 
tive measure, the option price decreases with respect to p. In the case of a Best-of call/call 
option, the payoff function is f (S1, S2) = max ((s: =)" (= K2)*) and we have: 


3? of (S1, 92) = —1 {S2 — Ko — $,+ Kı = 0, S1 > Kı, S2 > Ko} 


We have the same behavior than the Max option. For the Min option, we remark that 
min (S1, 82) = Sı + S2 — max (S1, S2). So, the option price is an increasing function of p. 
Other results could be found in Table 9.15. 


TABLE 9.15: Relationship between the basket option price and the correlation parameter 
p 


Option type Payoff Increasing Decreasing 
Spread (S2 = Si = Kk)" v 
Basket (a1 $4 + Q&S — K)” œa > 0 aja <0 

Max (max (S1, S2) — K)” v 
Min (min (S1, S2) — K)* y 
Best-of call/call | max { (S1 — K1)* , (S2 — K2)* V 
Best-of put/put max (Kı — Si)t 5 (Ko — S2)" vA 
Worst-of call/call | min ( (91 — K1)” , (S2 — K2)” / 
Worst-of put/put | min ( (K1 — S1)” , (K2 — S2)" Vv 


The sensitivity of the option price with respect to the correlation parameter p is called 

the cega: 

_ Co 

a 
Generally, it is difficult to fix a particular value of p, because a correlation is not a stable 
parameter. Moreover, the value of p used for pricing the option must reflect the risk-neutral 
distribution. Then, it is not obvious that the ‘risk-neutral correlation’ is equal to the ‘his- 
torical correlation’. Most of the time, we only have an idea about the correlation range 
p € [p-,p*]. The previous analysis leads us to define the lower and upper bounds of the 
option price when the cega is either positive or negative. We have: 


Co(p-) Co (o+)] ife D0 
ced (Co lot) Co (o)] ife <0 


We can define the conservative price by taking the maximum between Co (p7) and Co (p*). 
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Remark 117 In the case where pT = —1 and pt = 1, the bounds satisfy the one- 
dimensional PDE: 


where: 


9(S) = 5s (0) a = ( 503 4 (So = b)) r) 


The implied correlation Like the implied volatility, the implied correlation is the 
value we put into the Black-Scholes formula to get the true market price. At first 
sight, the concept of implied correlation seems to be straightforward. For instance, 
let us consider composite options, whose payoff is defined by (S(T) —kS2(T))*. It 
is a special case of the general payoff (a1$)(T)+a2S2(T)— K)" where a; = 1, 
ag = k and K = 0. The parameters are those given in Example 98. The val- 
ues (k,Co) taken by the relative strike k and the market price Co are respectively 
equal to (0.10,95.61), (0.20,86.10), (0.30, 76.59), (0.40, 67.08), (0.50,57.57), (0.60, 48.06), 
(0.70, 38.62), (0.80, 29.46), (0.90, 21.12), (1.00, 14.32), (1.10,9.45) and (1.20,6.30). Using 
these 12 market prices, we deduce the correlation smile with respect to k in Figure 9.45. We 
now consider the option, whose payoff is (4S1 (T) + $92 (T) — 100)". Which correlation 
should be used? There is no obvious answer. Indeed, we notice that a correlation smile is al- 
ways associated to a given payoff. This is why it is generally not possible to use a correlation 
smile deduced from one payoff function to price the option with another payoff function. 
Contrary to volatility, the concept of implied correlation makes sense, but not the concept 
of correlation smile. 


Riding on the smiles Until now, we have assumed that the volatilities of the two assets 
are given. In practice, the two volatilities are unknown and must be deduced from the volatil- 
ity smiles X; (K1, T) and Xə (Ko, T) of the two assets. The difficulty is then to find the corre- 
sponding strikes Kı and Ko. In the case of the general payoff (a151 (T) + a282 (T) — K)”, 
we have: 
(ai = 1,&2 =0,K > 0)> ki, =K 
{ (a, = —l,ag =0,K < 0) > k,=-K 

and: 


(a, =0,a2 =1,K>0)5> ko =K 
(a, =0,a2 = —-1,K <0) > K2 = -K 


The payoff of the spread option can be written as follows: 


(Si (T) — K1) + (K2 — S2 (T)))* 
(S1 (T) — K1)* + (K2 — S2 (T))* 
T MMM 


Call Put 


(Sı (T) — S2 (T) — K)* 


IA 


where Kı = Kə + K. Therefore, the price of the spread option can be bounded above by 
a call price on Sı plus a put price on Sj. However, the implicit strikes can take different 
values. Let us assume that Sı (0) = S2(0) = 100 and K = 4. Below, we give five pairs 
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FIGURE 9.45: Correlation smile 


(Kı, K2) and the associated implied volatilities (©4 (K1, T) , U2 (K2, T)): 


Pair #l #2 #3 4 #5 
Kı 104 103 102 101 100 
Kə 100 99 98 9 9 


X (K,T) 16% 17% 18% 19% 20% 
Zə (K2,T) 20% 22% 24% 26% 28% 
Čo 10.77 11.37 11.99 1261 13.24 


We also compute the price of the spread option” and report it in the last row of the above 
table. We notice that the price varies from 10.77 to 13.24, even if we use the same correlation 
parameter. We face here an issue, because this simple example shows that two-dimensional 
option pricing is not just an extension of one-dimensional option pricing, and the concept 
of implied volatility becomes blurred. 


9.3.2.2 The multi-asset case 
How to define a conservative price? In the multivariate case, the PDE becomes: 
1 n n 
z 5 a? S? 02, C + 5 Pi joio j SiS 50%, 5, C+ 
i=1 i<j 


S(biSj0jiC +9jC—rC = 0 


i=1 


with the terminal value: 


C (T, Si,- Sn) = f (S1 (T)... , Sn (T) 


T8The parameters are bı = 10%, b2 = 0%, r = 5%, p = 50% and T = 1. 
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Here, p;,; is the correlation between the Brownian motions of S; and Sj. Most of the time, 
the trader uses the same value p for all asset correlations p;,;. 

Rapuch and Roncalli (2004) show that the price is increasing (resp. decreasing) with 
respect to p if )>;<j dioj, s,f is a positive (resp. negative) measure. Let us consider the 
payoff function f (S1, 52,S3) = (S1 + S2 — S3 — K)t, we have: 


n 
S| cioj, s,f = (0102 — 0103 — 0203) -1 {Si + S2 — S3 — K = 0} 
i<j 
Hence, if 0102 — 0103 — 0203 > 0, the price increases with respect to p, and if 0102 — 0103 — 
0203 < 0, the price decreases with respect to p. As a result, it is more difficult to define 
conservative prices for multi-asset options. 


Issues with constant correlation matrices We consider a basket of n stocks. The 
basket volatility is given by: 


n n 
= X 2 y2 X 
gp = W; 0; +2 Pij WiWj FiO; 
i=1 i>j 


where w; is the weight of asset 7 in the basket, o; the volatility of asset i and p;,; the 
correlation between asset 7 and asset j. The implied correlation pimp of the basket is defined 
as the root of the following equation: 


n n 
2 X l 2—2 X no 
OBT wii — 2Pimp WiWjTiO j =0 
i=1 i>j 


Skintzi and Refenes (2003) deduce that: 


Another expression of the implied correlation is’: 


imp = 2 
(Xiz Witt) — Ly WF 
The concept of implied correlation has been very popular before the Global Financial Crisis. 
It was at the heart of a strategy known as volatility dispersion trading, which consists in 
selling variance swaps on an index and buying variance swaps on index components. 


The previous analysis assumes a constant correlation matrix C, (p) for modeling the 
dependance between asset returns. Over time, it has become the standard for pricing basket 


79Indeed, we have: 


n 


n 
Omax = j wo? +2 j WiWjOiOj = j Wisi 
i=1 


i>j i=l 
implying that: 


n n 2 n 
2 j WiWjOiOj = j Wisi = j wo? 


i>j 
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options with several assets. However, this approach implies a specific factor model. It is 
equivalent to assume that the underlying assets depend on a common risk factor with the 
same sensitivity. With such assumption, it is extremely difficult to estimate the conservative 
price of basket options with barriers, best-of/worst-of options, etc. To illustrate this problem, 
we consider the following payoff: 


(S1 (T) — S2 (T) + S3 (T) — S4 (T) — K), - 1 {9s (T) > L} 


We calculate the option price of maturity 3 months using the Black-Scholes model. We 
assume that S; (0) = 100 and X; = 20% for the five underlying assets, the strike K is equal 
to 5, the barrier L is equal to 105, and the interest rate r is set to 5%. In Figure 9.46, 
we report the option price when the correlation matrix is Cs (p). Since the option price 
decreases with respect to p, it can be bounded above by 2.20. If we simulate correlation 
matrices with uniform singular values, we notice that the maximum price of 2.20 is not a 
conservative price. For instance, if we consider the correlation matrix below, we obtain an 
option price of 3.99: 


1.0000 0.2397 0.7435 —0.1207 0.0563 

0.2397 1.0000 —0.0476 —0.0260 —0.1958 

C= 0.7435 —0.0476 1.0000 0.2597 0.1153 
—0.1207 —0.0260 0.2597 1.0000 —0.7568 

0.0563 —0.1958 0.1153 —0.7568 1.0000 


p (in Z) 


FIGURE 9.46: Price of the basket option with respect to the constant correlation 


9.3.2.3 The copula method 


Using Sklar’s theorem, it comes that the multivariate risk-neutral distribution has the 
following canonical representation: 


Q(S1 (t) pasyon (t)) = (Qı (Sı (t)) peria On (Sn (t))) 
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C2 is called the risk-neutral copula (Cherubini and Luciano, 2002). The copula approach has 
been extensively used in order to derive the bounds of basket options. For instance, Rapuch 
and Roncalli (2004) extend the results presented in Section 9.3.2.1 on page 583 to the copula 
approach. In particular, they show that if the payoff function f is supermodular®®, then the 
option price increases with respect to the concordance order. More explicitly, we have: 


Cı < Cə = Co (S1, S2; Ci) < Co (S1, S2; C2) 


Therefore, the previous results hold if we replace the Black-Scholes model with the Normal 
copula model. Thus, the spread option is a decreasing function of the Normal copula pa- 
rameter p even if we use a local or stochastic volatility model in place of the Black-Scholes 
model. In a similar way, one can find lower and upper bounds of multi-asset option prices 
by considering lower and upper Fréchet copulas. As shown by Tankov (2011), these bounds 
can be improved significantly when partial information is available such as the prices of 
digital basket options. 


In practice, the Normal copula model is extensively used for pricing multi-asset 
European-style option for two reasons: 


1. The first one is that multi-asset option prices must be ‘compatible’ with single-asset 
option prices. This means that it would be inadequate to price single-asset options 
with a complex model, e.g. the SABR. model, and in the same time to price multi-asset 
options with the multivariate Black-Scholes model. Indeed, this decoupling approach 
creates arbitrage opportunities at the level of the bank itself. 


2. The Normal copula model is a natural extension of the multivariate Black-Scholes 
model since the dependence function is the same. 


Nevertheless, we face an issue because the pricing of the payoff f (S1 (T), ..-, Sn (T)) re- 
quires knowing the joint distribution of the random vector (S1 (T) ,..., Sn (T)), whose an 
analytical expression does not generally exist®!. This is why multi-asset options are priced 
using the Monte Carlo method. However, the analytical distribution of the marginals are 
generally unknown. Therefore, we have to implement the method of empirical quantile func- 
tions described on page 806: 


1. for each random variable S; (T), simulate mı random variates S},,, and estimate the 


empirical distribution F;; 


2. simulate a random vector (w1,;,...,Un,j) from the copula function C (u1,..., Un); 
3. simulate the random vector ($1,;,...,5n,;) by inverting the empirical distributions 
F;: 


ij — F-! (uij) 


(2 


or equivalently: 


Siy + inf {« 


1 mı 
— Lir < St tou; 
my Des t = im} ans 
80The function f is supermodular if and only if: 


AQ) f := f (x1 +61, £2 +2) — f (a1 +e1, 22) — f (v1, 2 + €2) + f (x1, 22) > 0 


for all (x1, £2) € R? and (e€1,€2) € RÊ. 
81 An exception concerns the SABR model for which we have found an expression of the probability 
distribution thanks to the Breeden-Litzenberger representation. 
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4. repeat steps 2 and 3 mə times; 


5. the MC estimate of the option price is equal to: 
Ĉo = e7"T SS R yo Sn) 
F jato nj 


It follows that the first step is used for estimating the distribution of S; (T). For this 
step, we use mı simulations of the single-asset option model. However, this step generates 
independent random variables. Therefore, the steps 2 and 3 are used in order to create the 
right dependence between (S1 (T),...,Sn(T)). 


Example 99 We consider the two-asset option with the following payoff: 


f (P: (T), Fa (T)) = 100: (max (FAO RD 1) K) 


where F; (t) and F> (t) are two forward rates. We assume that F; (0) = 5% and F> (0) = 6%. 
The maturity of the option is equal to one year, whereas the strike of the option is set to 
2%. Using the SABR model, we have calibrated the volatility smiles and we have obtained 
the following estimates: 


a B v p 
Fı 8.944% 1.00 0.322 —22.901% 
Fə 12.404% 1.00 0.280 16.974% 


In Figure 9.47, we have reported the price of the two-asset option with respect to the 
dependence parameter p. For the Black-Scholes model, we use the ATM implied volatilities®? 
and the parameter p represents the implied correlation. For the SABR. model, we use the 
Normal copula model, and p is the copula parameter. We notice that the Black-Scholes 
model overestimates the option price compared to the SABR model. We also verified that 
the option price is a decreasing function with respect to p. 


9.3.3 Liquidity risk 


Liquidity risk can be incorporated in the theory of option pricing, but it requires solving 
a stochastic optimal control problem (Çetin et al., 2004, 2006; Jarrow and Protter, 2007; 
Çetin et al., 2010). In practice, these approaches are not used by professionals, but some 
theoretical results help to understand the impact of liquidity risk on option pricing. However, 
there is no satisfactory solution, and ‘cooking recipes’ differ from one bank to another one, 
one trading desk to another one, one trader to another one. But the issue here is not to 
solve this problem, but to understand the model risk from a risk management perspective. 

It is obvious that liquidity risk impacts trading costs, in particular the price of the 
replication strategy because of bid-ask spreads. Here, we don’t want to focus on ‘normal’ 
liquidity risk, but on ‘trading’ liquidity risk. Option theory assumes that we can replicate the 
option, meaning that we can sell or buy the underlying asset at any time. For liquid assets, 
this assumption is almost verified even if we can face high bid-ask spread. For less liquid 
assets, this assumption is not verified. Let us consider one of the most famous examples, 
which concerns call options on Sharpe ratio. Starting from 2004, some banks proposed 


82 They are equal to 9% for F, and 12.5% for F>. 
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FIGURE 9.47: Comparison of the option price obtained with Black-Scholes and copula- 
SABR. models 


to investors a payoff of the form (SR (0; T) — K)* where SR (0; T) is the Sharpe ratio of 
the underlying asset during the option period. This payoff is relatively easy to replicate. 
However, most of call options on Sharpe ratio have been written on mutual funds and hedge 
funds. The difficulty comes from the liquidity of these underlying assets. For instance, the 
trader does not know exactly the price of the asset when he executes his order because of 
the notice period®*. This can be a big issue when the fund offers weekly or monthly liquidity. 
The second problem comes from the fact that the fund manager can impose lock-up period 
and gates. For instance, a gate limits the amount of withdrawals. During the 2008/2009 
hedge fund crisis, many traders faced gate provisions and were unable to adjust their delta. 
This crisis marketed the end of call options on Sharpe ratio. 


The previous example is an extreme case of the impact of liquidity on option trading. 
However, this type of problems is not unusual even with liquid markets, because liquidity is 
time-varying and may impact delta hedging at the worst possible time. Let us consider the 
replication of a call option. If the price of the underlying asset decreases sharply, the delta 
is reduced and the option trader has to sell asset shares. Because of their trend-following 
aspect, option traders generally buy assets when the market goes up and sell assets when 
the market goes down. However, we know that liquidity is asymmetric between these two 
market regimes. Therefore, it is more difficult to adjust the delta exposure when the market 
goes down, because of the lack of liquidity. This means that some payoffs are more sensitive 
to others. 


83A subscription/redemption notice period requires that the investor informs the fund manager a certain 
period in advance before buying/selling fund shares. 
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9.4 Exercises 
9.4.1 Option pricing and martingale measure 


We consider the Black-Scholes model. The price process S (t) follows a Geometric Brow- 
nian motion: 


dS (t) = uS (t) dt+ aS (t) dW (t) 
and the risk-free asset B (t) satisfies: 
dB (t) = rB (t) dt 


We consider a portfolio (¢ (t) , y (t)) invested in the stock S and the risk-free bond B. We 
note V (t) the value of this portfolio. 


1. Show that: 
dV (t) = rV (t) dt + ¢ (t) (dS (t) — rS (t) dt) 


2. We note V (t) = e~"V (t) and Š (t) = e~"*S (t). Show that: 
dV (t) = ¢ (t) dS (t) 


3. Show that V (t) is a martingale under the risk measure Q. Deduce that: 


V (t) =e 77 VEC [V (T)| Fil 


4. Define the corresponding martingale measure. 


5. Calculate the price of the binary option 1{S(T) > K}. 


9.4.2 The Vasicek model 


Vasicek (1977) assumes that the instantaneous interest rate follows an Ornstein- 
Uhlenbeck process: 
{ dr (t) =a(b—r(t)) dt + o aW (t) 
r (to) = ro 


and the risk price of the Wiener process is constant: 
A(t) =A 
We consider the pricing of a zero-coupon bond, whose maturity is equal to T. 


1. Write the partial differential equation of the zero-coupon bond B (t,r) when the in- 
terest rate r (t) is equal to r. 


2. Using the solution of the Ornstein-Uhlenbeck process given on page 1075, show that 
the random variable Z defined by: 


Z= ERT 
J () 


is Gaussian. 
3. Calculate the first two moments. 


4. Deduce the price of the zero-coupon bond. 
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9.4.3 The Black model 


In the model of Black (1976), we assume that the price F (t) of a forward or futures 
contract evolves as follows: 
dF (t) = oF (t) dW (t) 


1. Write the PDE equation associated to the call option payoff: 
C (T) = max (F (T) — K,0) 
when the interest rate is equal to r. 


2. Using the Feynman-Kac representation theorem, deduce the current price of the call 
option. 


3. We assume that the stock price S(t) follows a geometric Brownian motion: 
dS (t) = uS (t) dt+ aS (t) dW (t) 


Show that the Black formula can be used to price an European option, whose under- 
lying asset is the futures contract of the stock. 


4. What does the Black formula become if we assume that the interest rate r(t) is 
stochastic and is independent of the forward price F (t)? 


5. What is the problem if we consider that the interest rate r (t) and the forward price 
F (t) are not independent? 


6. We reiterate that the price of the zero-coupon bond is given by: 
T 
B(t,T) = zQ [e J. r(s)ds F] 


The instantaneous forward rate f (t, T) is defined as follows: 
dn B (t, T) 
A 9 ESN 
FET) at 


We consider that the numéraire is the bond price B (t,T) and we note Q* the associ- 
ated forward probability measure. 


(a) Show that: 


aB(t,T Pe 
PRED L BET) EY TTF 
(b) Deduce that f (t, T) is an F;-martingale under the forward probability measure 


Q*. 
(c) Find the price of the call option, whose payoff is equal to: 
C (T) = max (f CE) ~~ K, 0) 


9.4.4 Change of numéraire and Girsanov theorem 

Part one 
Let X (t) and Y (t) be two F;-adapted processes. 
1. Calculate the stochastic differentials d (X (t) Y (t)) and d(1/Y (t)). 
2. We note Z (t) the ratio of X (t) and Y (t). Show that: 


dZ (t) aX aY) Y¥@),aY(t) (aX (t) ,aY (0) 
Z(t) xXx Y®A ` Y2(t) XAYA 
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Part two 


595 


Let S (t) be the price of an asset. Under the probability measure Q, S (t) has the following 


dynamics: 
dS (t) = us (t) S (t) dt + og (t) S (t) dW® (t) 


The corresponding numéraire is denoted by M (t) and we have: 
dM (t) = um (t) M (t) dt + om (t) M (t) dW® (t) 
We now consider another numéraire N (t) whose dynamics is given by: 


dN (t) = un (t) N (t) dt + on (t) N (t) dW (t) 


and we note Q* the probability measure associated to N (t). We assume that: 


dS (t) = yh (t) S (t) dt + os (t) S (t) AW% (t) 


1. Why can we assume that the diffusion coefficient of S (t) is the same under the two 


probability measures Q and Q*? 
2. Find the process g (t) such that: 
dw® (t) = dW® (t) — g (t) dt 
Let Z (t) be the Radon-Nikodym derivative defined by: 


— dQ* 


zO = 


Show that: 


Deduce that: 


Find the expression of uy (t). 


4. Show that changing the numéraire is equivalent to change the drift: 
Hs (t) = us (t) + os (t) (ow (t) — om (t)) 


5. Deduce that: 
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Part three 


Under the risk-neutral probability measure Q, we assume that the asset price and the 
numé€raire are given by the following stochastic differential equations: 


dS (t) = r (t) S(t) dt + os (t) S(t) dW (t) 


and: 


dN (t) = r (t) N (t) dt + on (t) N (t) dW (t) 


where N (0) = 1, we (t) and we (t) are two Wiener processes and E [we (t) we (t)| = pt. 
We note $ (t) = S (t) /N (t) the asset price expressed in the numéraire N (t). 


1. Find the stochastic differential equation of Š (t): 


i= u 


2. Let Q* be the martingale measure associated to the numéraire N (t). 
(a) We assume that oy (t) = 0. Show that the discounted asset price is an Fy- 
martingale under the risk-neutral probability measure. 
(b) We consider the case we (t) = WX (t). Using Girsanov theorem, show that: 


dS (t) = õ (t) S(t) AW% (t) 


where WÊ” is a Brownian motion under the probability measure Q* and G (t) is 
a function to be defined. 


(c) What does this result become in the general case? 


9.4.5 The HJM model and the forward probability measure 


We assume that the instantaneous forward rate f (t, T1) is given by the following stochas- 
tic differential equation: 


df (t, T1) = a (t, T1) dt + o (t, T1) dW® (t) 
where Q is the risk-neutral probability measure. 


1. We consider the forward probability measure Q* (T2) where Tə > Tı. Define the 
corresponding numéraire N (t) and show that the Radon-Nikodym derivative is equal 


to: i ee 
dQ* _ a fo? @W-F(0.t)) ae 


dQ 


2. We recall that the dynamics of the instantaneous spot rate r (t) is: 


O=O (ots. f ow au) ast f olsa) dw® (s) 


Show that: Z a E 
dQ = aie 2 a(t,T2) at+ f 2 b(t,T2) dW°(t) 


dQ 
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a (t, To) =- f" (oto f olta au) dv 


b(t, To) =- [ot dv 


where: 


and: 


3. Using the drift restriction in the HJM model, show that: 
i t 
WT) (t) = W2 (t) — J b (8s, Ta) ds 
0 


is a Brownian motion under the forward probability measure Q* (T2). 
4. Find the dynamics of f (t, Tı) under the forward probability measure Q* (T3). 
5. Show that f (t, Tı) is a martingale under the forward probability measure Q* (T). 
6. We recall that the price of the zero-coupon bond satisfies the SDE: 
dB (t, T) = r (t) B(t,T) dt + b(t, T) B (t, T) dw2 (t) 


(a) Show that: 
B (t, T2) B (s, T2) eX (s:t) 


B(t,T;) B(s,Tı) 
where X (s, t) is a random variable to define. 
(b) Deduce that B (t, T2) /B (t, Tı) is a martingale under Q* (T). 


9.4.6 Equivalent martingale measure in the Libor market model 


Let L; (t) = L (t, Ti, Ti+1) be the forward Libor rate when resetting and maturity dates 
are respectively equal to T; and T;41. Under the forward probability measure Q* (T;+1), the 
dynamics of L; (t) is given by the following SDE: 


dL; (t) = y: (t) Li (t) aW TH (t) 


1. Using the definition of the Libor rate, find the relationship between B (t, Tj+1) /B (t, T4) 
and L, (t). Let Tk+1 > Ti+ı1. Deduce an expression of the ratio: 


B (t, Tk+1) 
B (t, Ti41) 


in terms of Libor rates L; (t) (j =i+1,...,k). 


2. We change the probability measure from Q* (T;+1) to Q* (Tk+1). Define the numéraires 
M (t) and N (t) associated to Q* (Ti41) to Q* (Tk+1). Deduce an expression of Z (t): 


— dQ (Tk+1) 
ZO = G0 (Tan) 


in terms of Libor rates L, (t) (jJ =i+1,...,k). 


3. Calculate dlnZ (t). 
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4. Calculate the drift ¢ defined by: 
dL; (t) 
= InZ (t 
c= (S49 amz ie) 


5. Show that the dynamics of L; (t) under the forward probability measure Q* (Tk+1) is 
given by: 


dL; (t) 
Li (t) 
where Hiẹ (t) is a drift to determine. 


= pik (t) dt + qi (t) awg To (t) 


6. What does the previous results become if Tk41 < Tj41? 


9.4.7 Displaced diffusion option pricing 
Brigo and Mercurio (2002a) consider the diffusion process X (t) given by: 


{ dX (t) = p(t, X (t)) dt + o (t, X (t)) dW2 (t) 
X (0) = Xo 


They assume that the asset price S (t) is an affine transformation of X (t): 
S(t) =a (t) +8 (t): X(t) 
where £ (t) > 0. 


1. By applying Itô’s lemma to S (t), find the condition on a(t) and 8 (t) in order to 
satisfy the martingale condition: 


a? fe *. S(t) | Fo] = So 


where b is the cost-of-carry parameter. 


2. We consider the CEV process: 
dX (t) = u (t) X (t) dt + o (t) X (t dW® (t) 
where y € [0,1]. Show that the solutions of a (t) and £ (t) are: 
a (t) = ao - exp (bt) 
t 
B (t) = Bo -exp (Jy (b — u (s) ds) 
3. Deduce the SDE of S (t). 


4. We consider the case y = 1. Give the SDE of X (t). Calculate the solutions of X (t) 
and S (t). 


5. Give the price of the European call option, whose payoff is equal to (S (T) — K ae 
6. We now assume that ø (t) =o. 


(a) Using the formula of Lee and Wang (2012), give an approximation of the implied 
volatility 5 (T, K). 
(b) Calculate the volatility skew: 


(c) Give the price of the binary call option in the case of the BS model. 
(d) Deduce the BCC price when we consider the SLN model. 
(e) Give an approximation of the BCC price based on the implied volatility skew. 
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9.4.8 Dupire local volatility model 
We assume that: 
dS (t) = bS (t) dt + a(t, S (t)) S (t) dW® (t) 


1. Give the forward equation for pricing the call option C (T, K). Deduce the expression 
of the local variance o? (T, K). 


2. Using the Black-Scholes formula, find the relationship between the local volatility 
o (T, K) and the implied volatility © (T, K). 


3. We consider the discounted payoff function: 
TEST Ea eTR 


Using Itô’s lemma, calculate the derivative of the call option with respect to the 
maturity: 


2 [af (T, S (T))| Fe] 
dT 


4. Calculate ôxC (T, K) and 07-C (T, K) using the discounted payoff function. Retrieve 
the forward equationët of Dupire (1994). 


ôrC (T, K) = 


5. We introduce the log-moneyness zx: 


x p(T, K) 


So 
In — + 0T 
ng t 


and the functions ĉ (T, x) and È (T, x), which are defined by the relationships: 
X(T, K) = X (T, 4 (T, K)) 


and: 
a (T, K) = õ (1,9 (T, K)) 
(a) Calculate dı, d2 and dıd2. 
(b) Write the derivatives 3x X (T, K), rE (T, K) and 02.5 (T, K) using the variables 
T and z. 
(c) Deduce the relationship between õ (T, x) and Ý (T, 2). 
(d) Show that: 
~ 1 
0,% (0,0) = ző (0,0) 


9.4.9 The stochastic normal model 


Let F (t) be the forward rate. We assume that the dynamics of F (t) is given by the 
SABR model: 
{ dF (t) = a(t) F(t)? aw? (t) 
da (t) = va (t) dW (t) 


where E [we (t) We (0) = pt. In what follows, we consider the special case 6 = 0. 


84This approach has also been proposed by Derman et al. (1996). 
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10. 


11. 


12. 
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. How to transform the Black volatility £s (T, K) into the implied normal volatility 


Sea 


. Give the expression of the implied normal volatility © y (T, K) for the general case 


b € [0,1]. 


. Deduce the formula of £y (T, K) when 8 = 0. 

. What is the ATM normal volatility? 

. Calculate On Uy (T, K). 

. Recall the price of the call option for the normal model, whose volatility is oy. 


. We now assume that oy is equal to the SABR normal volatility Uy (T, K). Deduce 


the cumulative distribution function of F (T). 


. By considering the following approximation®®: 


F, 
/FoK In a ~ Fy —K 


calculate the probability density function of F (T). 


. Show that: 


vt 
1 
F(t) = Fo + a exp (-5s + W2 ()) dW, (s) 
0 
where Wj (t) and W2 (t) have the same properties as WẸ (t) and WS (t). 


We note: 


and: i 
M° (t) = exp (3a + aW w) 


Let us introduce the function Y™* (t): 


wre (t) = E[X” (t) M° (é)] 


where n € N and a € R4. Verify that Y(t) satisfies the ordinary differential 
equation: 
dw™*(t)  a(a—1) zi 
_ wre n jatl1 
al 7 (t) + npaw (t) + 
where &":* (0) = 0. What is the link between Y™° (t) and the statistical moments of 
F(t)? 


n(n — 1) 
2 


wr-2.at2 (t) 


Calculate U°-* (t), U4 (t), U2 (t), V30 (t) and Y4? (t). Deduce the first four central 
moments of F (t). 


Calculate an approximation of the volatility, skewness and kurtosis of F (t) when 
t~0. 


85Hagan et al. (2002) calculate this expression in Appendix A.4 on page 102. 
86Hagan et al. (2002), Equations (A67b) and (A68a), page 102. 
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13. We assume that Fo = 10% and T = 1, and we consider the following smile: 


K 7% 10% 13% 
Ep(T,K) 30% 20% 30% 


a) Calculate the equivalent normal volatility Uy (T, K). 


(c) Draw the cumulative distribution function of F (T). What is the problem? 
(d) Draw the probability density function of F (T) when we consider the approxi- 
Fi 
mation VFK In ra ~R-K. 


(a) 

(b) Calibrate the parameters of the stochastic normal model. 
) 
) 


(e) Calculate the skewness and the kurtosis of F (T). Comment on these results. 


9.4.10 The quadratic Gaussian model 
We consider the quadratic Gaussian model: 
r(t)=a(t)+ 6)’ XAXA TEX e) 
where the state variables X (t) follow an Ornstein-Uhlenbeck process: 
dX (t) = (a(t) + B (t) X (t)) dt + È (t) dW® (t) 
1. Find the PDE associated to the zero-coupon bond B (t, T). 
2. We assume that the solution of B (t, T) has the following form: 
B (t,T) = exp (-â T) - 8,7) X (H -X ATÊET)X (H) 
where Î (t, T) is a symmetric matrix. Show that â (t, T), Ê (t, T) and Î (t, T) satisfy 
a system of ODEs. 
3. Find a condition that Î (t, T) is asymmetric matrix. Why do we need this hypothesis? 


4. Let Q* (T) be the forward probability measure. Recall the dynamics of X (t) under 
Q* (T). Using the explicit solution, demonstrate that X (t) is Gaussian: 


X (t) ~ N (m(0,t), V (0,0)) 


Find the dynamics of m (0,t) and V (0,t¢). Compare these results with those obtained 
by El Karoui et al. (1992a). 


5. Define the Libor rate L (t, Ti—1, Ti). 


6. Demonstrate that the pricing formula of the caplet is equal to: 


Caplet = B (0, t) - E2*® [max (0, g (X))] 
where Q* (t) is the forward probability measure and g (x) is a function to define. 


7. Show that: 


Caplet = B (0,4) f h(a) dx 
E 


where h (x) = g(x) ¢ (x; m (0, t), V (0, t)) and € is a set to define. 
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8. We consider the following function: 


2 
2 ,—ax*—ba—c 
e 


i V2rV 


ae ee 
en av =m) dg 


J (a,b, c, m, V, £1, £2) = 


Find the analytical expression of 7. 


9. Deduce the analytical expression of the caplet. 


9.4.11 Pricing two-asset basket options 
We assume that the risk-neutral dynamics of Sı (t) and S% (t) are given by: 


{ dS; (t) = b,S; (t) dt + 01S; (t) dW (t) 
dS» (t) = b2S2 (t) dt + 25> (t) dW (t) 


where WẸ (t) and WS (t) are two correlated Brownian motions: 


[wR (t) WE O] = pt 


1. By considering the following payoff (aS; (T) + a282 (T) — K)*, show that the price 
of the option can be expressed as a double integral. 


2. We consider the computation of J = E [Aeree -= D)*] where £ ~ N (0,1), and A, 


b, cand D are four scalars. 


(a) Find the value of I when A > 0 and D> 0. 
(b) Deduce the value of J in the other cases. 


3. We assume that a; < 0, a2 > 0 and K > 0. Using the Cholesky decomposition, 
reduce the computation of the double integral to a single integral. 


4. Extend this result to the case a, > 0, ag < 0 and K > 0. 


5. Discuss the general case. 


Chapter 10 


Statistical Inference and Model Estimation 


In this chapter, we present the statistical tools used in risk management. The first section 
concerns estimation methods that are essential to calibrate the parameters of a statistical 
model. This includes the linear regression, which is the standard statistical tool to investigate 
the relationships between data in empirical research, and the method of maximum likelihood 
(ML), whose goal is to estimate parameters of non-linear and non-Gaussian financial models. 
We also present the generalized method of moments (GMM), which is very popular in 
economics because we can calibrate non-reduced forms or structural models. Finally, the 
last part of the first section is dedicated to non-parametric estimators. In the second section, 
we study time series modeling, in particular ARMA processes and error correction models. 
We also investigate state-space models, which encompass many dynamic models. A focus is 
also done on volatility modeling, which is an important issue in risk management. Finally, 
we discuss the application of spectral analysis. Most of statistical tools presented in this 
chapter are used in the next chapters, for example the estimation of copula models, the 
calibration of stressed scenarios or the implementation of credit scoring. 


10.1 Estimation methods 
10.1.1 Linear regression 


Let Y and X be two random vectors. We consider the conditional expectation problem: 


y =E[Y |X =z] = m(z) (10.1) 


The underlying idea is to find an estimate M (x) of the function m (a). In the general case, 
this problem is extremely difficult to solve. However, if (Y, X) is a Gaussian random vector, 
the function m (x) can then be determined by considering the Gaussian linear model: 


Y=6'X+u (10.2) 
where u ~ N (0, a”). Most of the time, the joint distribution of (Y, X) is unknown. In this 
case, the linear model is estimated by applying least squares techniques to a given sample 
(Y,X): 

Y=XC6+U 


Remark 118 In order to distinguish random variables and observations, we write matrices 
and vectors that are related to observations in bold style. 
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10.1.1.1 Least squares estimation 


Derivation of the OLS estimator We consider a training set of n iid samples (y;, £i). 
For the it? observation, we have: 


K 
Yi = 5 BrXi,k + Ui (10.3) 
k=1 
The least squares estimate of the parameter vector ( is defined as follows: 
n 
6 =argmin 5 u? 
i=1 
We introduce the following matrix notations: Y is the n x 1 vector with elements Y; = yi, 


X is the n x K matrix defined as follows: 


T1,1 T1K 


Tn,1 Tn, K 


and U is the n x 1 vector with elements U; = u;. In this case, the system of equations 
(10.3) becomes: 
Y=X6+U (10.4) 


Let RSS (8) be the residual sum of squares. We have: 
RSS(8) = Sou; 


= U'U 
= Y'y-26'x'y+pe'x'xs 


The least squares estimator verifies the set of normal equations 0gU 'U = 0 and we deduce 
that —2XTY + 2X'X8 = 0. The expression of the least squares estimator is then: 


B= (XTX) ` XTY (10.5) 


To obtain the expression of B , we only need the assumption that the rank of the matrix X is 
K. In this case, B is the solution of the least squares problem. To go further, we assume that 
(Y, X) is a Gaussian random vector. The solution of the conditional expectation problem 
a [Y | X =a] = m (x) is then: 


m(x) = x'ôĝ 


= z' (XTX) ` XTY 


It means that the prediction of Y given that X = z is equal to 9 = a! 8. If we consider the 
training data X, we obtain: 


Y = m(X) 
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where H = X (XTX) X' is called the ‘hat matrix!. We notice that M (X) is a linear 
predictor of Y. 


Statistical inference Because (Y, X) is a Gaussian random vector, it implies that u = 
Y — 6'X is a Gaussian random variable. We notice that: 


Ê = (XTX) XTY 
= $+(XTX) XTU 


By assuming the exogeneity of the variables X — meaning that E [u | X = z] = 0 — we 
deduce that 8 is an unbiased estimator: 


[a] = eae 
= B 
We recall that U ~ N (0, o°In). It follows that: 


=Ç) = [e-a 6-9) 


= E|(X™X) 'XTUUTX (XTX) “| 


1 


= (XTX) ` XTE [UUT] x (xX) ` 


= (XTX) XT (In) X (XTX) ` 


= (XTX) ` 
We conclude that: : F 
BN (3. o? (XTX) ) 
In most cases, g? is unknown and we have to estimate it. The vector of residuals is: 
Ô = Y-¥ 
Y—x 


We notice that E [ê] = 0 and var (©) = o° (In — H). Because RSS (3) = Ô" (In - H) Ô 
is a quadratic form, we can show that: 


lWe interpret H as the orthogonal projection matrix generated by X implying that H is idempotent, 
that is HH = H. Indeed, we have: 


HH = X(X"xX) X (XTX) x] 


1x 
1x 


= X(X'X) 
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is an unbiased estimator of o° and 67/0? ~ x2_p. In order to measure the model quality, 
we consider the coefficient of determination or R?. It is defined as follows: 


ee B55 (9) 


where TSS = J`; (y: — y) is the total sum of squares. We have RŽ < 1. A high (resp. low) 
level indicates a good (resp. bad) goodness-of-fit of the regression model. 


Example 100 We consider the data given in Table 10.1. We would like to explain the 
dependent variable y; by four explanatory variables 11, x2, x3 and x4. There are 10 obser- 
vations and we note that xı is in fact a constant. 


TABLE 10.1: Data of the linear regression problem 


y fn: T3 T4 
1.5!1.0 2.4 3.6 0.3 
20.4 , 1.0 1.1 3.8 5.9 
17.1! 1.0 5.1 6.3 6.1 
30.9, 1.0 2.7 2.4 9.5 
22.2 ı 1.0 3.3 3.0 7.4 
9.1 1.0 1.0 54 49 
39.2 ı 1.0 96 2.8 8.1 
3.1 i 1.0 2.9 4.4 1.0 
7.2,1.0 4.2 56 1.7 
27.6 l 1.0 8.1 1.7 5.4 


>. 


SOON OOTKRWNH 


jad 


We consider the linear regression model: 
Yi = b1Ti 1 + B2£i 2 + P3xi3 + b4Ti 4 + Ui 


It follows that: 
10.000 40.400 39.000 50.300 


40.400 235.980 143.660 224.830 
39.000 143.660 172.460 179.170 
50.300 224.830 179.170 339.790 


X'X= 


and: 
178.300 
Toe 918.150 
ee oe 591.190 
1209.440 
We deduce that the estimates are: 
B, 3.446 
ga| b | _ 1.544 
E Bs ~ | —1.645 
Bs 2.895 


We can then compute the residuals. We obtain ti; = —0.597, tg = 4.427, etc. The sum of 
squared residuals is equal to RSS (3) = 40.184, which implies that ô = 2.588. Therefore, 


Statistical Inference and Model Estimation 607 


the estimate of the covariance matrix of B is: 


15.353 —0.602 —2.263 —0.682 


s\ a pete 1 | 0.602 0.108 0.061 —0.015 
moe (ô) =ô" (X X) =| 5963 0061 0428 0069 
—0.682 —0.015 0.069 0.094 


For this linear regression, the coefficient of determination is equal to: 


p? q _ 40-184 


— ——— = 97.17 
c 1422.041 a 
By construction, the standard errors ø (4x) of the estimator Br is the square root of the 


kt! diagonal element of cov (3). The assumption Ho : Br = by is then tested by computing 
the t-statistic: 


and the associated p-value?: 
p=2(1—tn—x (|t])) 
For instance, we report the t-statistic and the p-value associated to the hypothesis Ho : 


Êk = 0 in Table 10.2. We cannot reject this assumption for the estimate 6 at the 10% 
confidence level, meaning that (1 is not significant. 


TABLE 10.2: Results of the linear regression 


Parameter | Estimate Strandet t-statistic p-value 
error 

By 3.4461 3.9183 0.8795 0.4130 

Bo 1.5442 0.3289 4.6943 0.0033 

Bs —1.6454 0.6543 —2.5146 0.0457 

Ba 2.8951 0.3071 9.4264 0.0001 


Gauss-Markov theorem Let 3 = AY bea linear estimator. The Gauss-Markov theorem 


states that, among all linear unbiased estimators of £, B = (X'X) -1 XTY has the smallest 
variance: 


var (3) = var (3) 


In this case, we say that B is BLUE (best linear unbiased estimator). Let us write Bas 
follows: 


B = AY 
AXZ+ AU 
We have: 
[ő] = E[AY 
= AX(+E[AU] 
= AXB 


?The p-value is the estimated probability of rejecting the null hypothesis Ho. 
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We deduce that 8 is unbiased if AX = Ig. We also notice that: 


var (8) = E IG - 2) eee B) | 
Saif [avuta] 


o? (aa) 


Il 


We set A = B + (X X): X!'. We have BX = 0 because AX = Ix. It follows that: 


AA’ = (B+(X'X) 'X')(B+(K'X) xT) 


1 1 


= BB' +(X"X) ` XTBT +BX (XX) 
= BB'+(XTX) ` 


+ (XTX) 


Because the matrix BB! is positive semi-definite, we finally deduce that: 
var (3) = o? (aa‘) 
o? (BB’ + (XTX) ') 


1 


I 


IV 


o? (X'X) 


10.1.1.2 Relationship with the conditional normal distribution 


Let us consider a Gaussian random vector defined as follows: 


(J eae |) 


On page 1062, we show that the conditional distribution of Y given X = gx is a multivariate 
normal distribution where: 


Myla = u [Y | X = q] 
Hy + Dyas (x F Hx) 


and: 


Zyl = o? [Y | X =a] 


1 1 ‘is 
= Yyy- Xy ye ey 


We deduce that: 
Y = ply + pe (£ — pz) +u 


where u is a centered Gaussian random variable with variance o? = X It follows that: 


yoylae 
Y = (Hy ma Nyang ie) F Dorong +u (10.6) 
1|— aes T+ 
Bo BT 


We recognize the linear regression of Y on a constant and a set of exogenous variables X: 


Y =pbo+6'X+u 
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Moreover, we have: 


o2 


R = 1- 


Yy,y 
_ Eye brr ley 
Yy,y 
Example 101 We consider a Gaussian random vector X = (X1, X2, X3, X4). The expected 
values are equal to py = 2, u2 = 5, u3 = —4 and u4 = 3 whereas the standard deviations are 
equal to a, = 1, o2 = 2, 03 = 0.5 and o4 = 1. The correlation between the random variables 
is given by the following matriz: 


1.00 

= | 0.90 1.00 

P= | 0.70 0.40 1.00 
0.60 0.50 0.30 1.00 


For each random variable X;, we can compute the conditional Gaussian regression using 
the previous formulas: 


Xi = Bot >_ brXk+u 
k#i 
Results are reported in Table 10.3. For example, it means the linear regression of X 1 on 
Xo, X3 and X4 is: 


Xı = 2.974 + 0.335 - X2 + 0.774 - X3 + 0.148 - X4 +u 


where u ~ M (0,0.19?) and the associated R? is equal to 96.39%. 


TABLE 10.3: Results of the conditional Gaussian regression 


Y bo ı Ba Be Bs Ba a R? 

Xi 2.974 ! 0.335 0.774 0.148 | 19.01% 96.39% 
Xə | —7.205 i 2.667 —1.949 —0.308 | 53.59% 92.82% 
X3 | —4.017 ı 1.000 —0.317 —0.133 | 21.60% 81.33% 
X4 | —4.273 l 2.091 —0.545 —1.455 71.35% 49.09% 


Remark 119 The previous analysis raises the question of the status of the variables Y 
and X in the linear regression framework. In this model, Y is called the dependent variable 
and X are called the independent (or explanatory) variables. This implies that there is a 
relationship from X to Y. In some way, linear regression has a strong connotation of an 
explicit directional (or causal) relationship. However, the previous example shows clearly 
that linear regression does not mean causality! 


Indeed, linear regression may be viewed as another way to interpret the correlation 
between random variables. Let us consider the case where X is a one-dimension random 
variable. We note pz,, the correlation between X and Y whereas og and gy are their standard 
deviations. In this case, we have: 


2 
or Px yOz 
z= ( 7 a) 


Px,yFaPy Or 
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The conditional Gaussian regression (10.6) becomes: 

Y = bo+ BX +u 
where: 


B= 


and: 
Bo = Hy — 


We also deduce that the expression of the R? statistic is: 


2 
2 _ (Px,yFxFy) 
Re _ 242 
ase, 
_ 2 
= Pz,y 


The coefficient of determination is then the square of the correlation coefficient, meaning 
that their significance is not of the same magnitude. Indeed, a value of 50% for the R? 
statistic corresponds to a value of 70% for the correlation. 


The previous analysis shows that: 


cov (X,Y) 
var (X) 


The single risk factor model of Sharpe (1964) exploits this result since we have: 
Rit = a; + biRm,t + Uit 


or equivalently: 
cov (Ri, Rm) 
var (R;) 


where Ri is the asset’s return and Rm, is the market’s return. 


i = 


10.1.1.3 The intercept problem 


Example 102 We consider the following data with 20 observations: 


yi | 13.9 11.5 149 146 13.7 17.3 18.1 14.8 14.7 14.7 
ril 44 34 49 44 44 63 66 48 45 46 
yi | 16.1 14.6 15.2 18.1 11.5 140 184 15.0 12.0 14.8- 
zril 5.7 50 52 65 32 45 69 52 31 49 


We want to explain the dependent variable Y by the explanatory variable X. 
If we include a constant in the regression model, we obtain: 


Y = 1.8291 + 5.8868- X +u with u {~N (0,0.35227) (10.7) 
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20 p 2 
7 
7 
7 
e — Data oe 
18 L —— OLS with intercept Fi e 
===: OLS without intercept 7 


7.0 


FIGURE 10.1: Illustration of the intercept problem 


Without the intercept, the linear regression becomes: 
Y = 2.9730-X +u with u~ N (0.2529, 1.29807) (10.8) 


The corresponding fitted curves are reported in Figure 10.1. By omitting the constant, we 
have overestimated the slope 8 of the curve. 


The previous example shows that a linear regression is valid only if we include the 
intercept in the model. Indeed, without the intercept, the residuals are not centered: E [u] 4 
0. If we consider the conditional Gaussian regression, we have the relationship bo = Hy — 
B! py. By omitting the constant - Y = 8! X +u, we set 3 = 0 and the previous relationship 
does not hold any more. In this case, the residuals incorporate the mean effect of the data. 
Indeed, we have E[Y] = 6'E [X] +E [u] meaning that E [u] = uy — 8! Hx is not necessarily 
equal to zero (see Exercise 10.3.2 on page 705). 


Remark 120 Let us consider the linear regression with a constant. We have Y = Bo + 
B! X +u. It follows that fly = Bot 6' pe because the residuals are centered. We deduce that: 


Y—py=8' (X—pe)+u 


By considering the centered data instead of the original data, the intercept problem vanishes. 
This type of transformation is common in statistics and finance. Generally, raw data have 
to be analyzed and modified if we want to obtain more robust relationships. Normalizing, 
using logarithmic scale or creating dummy variables are some examples of data processing. 


10.1.1.4 Coefficient of determination 
We have defined the coefficient of determination as follows: 
i u? 


R?=1 - — 
et (yi — 9) 
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We can show that R? < 1. If one of the independent variables is a constant, the linear 
regression model becomes: 


K 
Yi = bo + X Bevin + ui 
k=1 
In this case, we have 0 < R2 < 1. Testing the hypothesis Ho : 61 = ... = Bx = 0 is 
equivalent to consider the Fisher test: 


(n- K) Rè 


cam PER 


Therefore, the use of R? is valid when there is a constant in the linear regression. It measures 
the significance of the model versus the naive model: y; = y+ us. If the constant is omitted, 
R? can take negative value and is not pertinent. In this case, it is better to use the uncentered 
coefficient of determination: 


We can show that 0 < R? < 1. One of the drawbacks when using R? or R? comes from the 
fact that the coefficient of determination increases with the number of exogenous variables. 
The more the number of explanatory variables, the more the R-squared. To correct this 
effect, we can use adjusted coefficients of determination: 


(a uz) /(n- K) 
(ein YP) /n 


R =1 
and: 
(Eii uz) /(n—K) 


a (Eia i - 9) /(m=1) 


10.1.1.5 Extension to weighted least squares regression 


Definition The weighted least squares (or WLS) estimator is defined by: 
B = arg min 5 wiur 
i=1 


where w; is the weight associated to the it" observation. It is obvious that the analytical 
solution is: T En 
B=(X'WX) X'WY 


where W is a diagonal matrix with W; ;¿ = wi. 


Robust regression Let us consider the least squares problem: 
p= argmin X- p(w) =argmin Š p (yi — x} B) (10.9) 
i=1 i=1 


where p (u) = u?. Huber (1964) suggests to generalize this method by considering other 
functions p(u). In this approach (called M-estimation), the function p(u) satisfies some 
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properties: p(u) > 0, p(0) = 0, p (u) = p(—u) and p (u1) > p (u2) if |u1| > |ug|. If we note 
qp (u) = p' (u), the first-order conditions of Problem (10.9) are: 


n 


Sov (yi-2)8)tin=0 forall k=1,...,K 


i=1 


We deduce that: 


By writing w; = w (ui) /ui, we finally obtain: 
5 wi (yi — æ] B) tik =0 
i=1 


We notice that the system of equations corresponds exactly to the first-order conditions of 
the WLS problem. The only difference is the endogeneity of the weights w; that depend on 
the residuals u;. To solve this system, we use the following iterative algorithm: 


1. we choose an initial value 8); 


2. we calculate the diagonal matrix W9-) with w; = y (ui) /u; where u; = yi — 
x] BO-Y); 


3. at the jt? iteration, we calculate the WLS estimator: 


O — (xTwG-Dx\ 7 xT wU-D 
B XTWU-DX) xT WwG-Dy 


4. we repeat steps 2 and 3 until the convergence of the algorithm: |84) — B9-))| < e; 
5. the M-estimator B is equal to 80°), 


The most well-known M-estimator is obtained by setting p(u) = |u| and w (u) = sign (u) 
and is called LAD (least absolute deviation). A variant is proposed by Huber (1964): 


u? if |u| < c 
puy={ ies 


clu| if jul >c 


These two estimators are less sensitive to outliers than the OLS estimator. This is why we 
call them robust estimators. 


Quantile regression Let us consider a random variable Y with probability distribution 
F. The quantile of order a of Y is defined by: 


Q (a) = inf {y | F (y) > a} 


The estimator ĝa of Q (a) is given by: 


da = argmin X alyi—al-+ >) - a) lui — al 
a Yizq Yi<q 


or: 
n 


da = are min Š Xa (ui — 0) 
p 
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where? Xa (u) = u- (a — 1 {u < 0}). If we consider the Gaussian linear model Y = X6 +U 
and apply the previous approach to the random variable U, the estimator Ba of the quantile 
regression of order a is: 


n 
Bo = arg mun >, Xa (yi = a; 8) 


In the case a = 50%, we obtain the median regression: 
n 
z , 3 
= arg mın = 
b50% 8 JERK >, [yi i b| 


It consists in minimizing the sum of absolute values of residuals. 


Since we have: 


Yi = I Ba + ui 
i (Bt — px) +u u 


we obtain the linear programming problem: 


8 


8 


Zz = argmin clz 


4 Az =b 
s.t. 2>0 


where Y and X are the vector of y;’s and the matrix of x; x’s, z = ( 6 By Ut U- 


A=(X -X Ih -h ),b=Yandc=(0, On al, (1—a)1, )'. The standard 


approach to find the estimator Ba = Bt — Bs is to solve this LP program using interior 
points methods (Koenker, 2005). 


An alternative method is to use the robust regression with: 


and: 
w(u)=a-—1{u<0} 


In the case a = 50%, we obtain p(u) = u- (0.5 — 1 {u < 0}) = 0.5 - |u|. The estimator of 
the median regression is then the LAD estimator. 


10.1.2 Maximum likelihood estimation 
10.1.2.1 Definition of the estimator 


We consider a sample Y = (y1,..., Yn) of n observations. We assume that the probability 
of the sample may be written as a parametric function: 


PE = y1,- Yn =a) = LAY |0) 
= L(6|Y) 


3 Because we have: 


Xa (u) w:(a—1{u< 0}) 
_ re ifu<0 
7 a: |ul ifu>0 


jul-1{u > 0}+ (1—a)-|u]-1{u < 0} 
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where 0 is a K x 1 vector of parameters to estimate. The function L is called the likelihood 
function whereas the maximum likelihood estimator (MLE) is defined as follows: 


6 = arg max L (0 | Y) (10.10) 
(Tats) 


where @ is the parameter space*. The principle of maximum likelihood is to find the value 
of 0 that maximizes the probability of the sample data Y. This is an inverse probability 
problem because we do not want to calculate the probability L(Y | 0) of the sample given 
a model and a parameter vector 0, but we want to estimate the implicit parameter 0 given 
the sample and the model. 


We have: 
Pr{¥i = y1.. Yn = Yn} = Pr{M=m}- 
Pr{ =y |Y =y}: 
Pr {Y, = yn | Yi = y1,---) ¥n—1 = Yn-1} 


Assuming that the observations are independent simplifies the computation of the likelihood 
function: 


LIY) = [[P{x = u} 


i=1 


II Li (9 | wi) 
i=l 


where L; (0 | y:i) = Pr {Yi = yi}. Li is called the likelihood of the observation i and cor- 
responds to its density. Generally, the optimization problem (10.10) is replaced by the 
following which is more tractable: 


6 = arg max £ (6 | Y) (10.11) 
PEO 
i) = In L; (0 | Y;) is the log-likelihood function for the observation i and 
£(0| Y) = X; 4 (0 | Yı). The gradient of the log-likelihood function is called the score 


function: 
_ 0l(8| Y) 


S (6) 70 
At the optimum, we have S (ô) =0. 


Example 103 (Bernoulli distribution) We consider the sample Y = {y1,.. . , Yn} where 
yi takes the value 1 with probability p or O with probability 1 — p. We note no and nı the 
number of observations, whose values are respectively equal to O and 1. We have no+ni =n. 


We have: 
l1—yi : 
Pr{Y; = y} = (1- p) "p" 


4It is generally equal to R”. 
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It follows that the log-likelihood function is: 


fo) = SY nPr{y = y} 
j=l 


= X G-u) a(l- p)+yi: np 
i=1 
= no:In(l—p)+n1-Inp 


The first-order condition is: 


o£ 
(p) =04 ny no 0 
Op p l-p 
We deduce the expression of the MLE: 
ae M1 _ my 
aa no + ny on 


In Figure 10.2, we have represented the log-likelihood function €(p) with respect to the 
parameter p when no = 46 and nı = 74. We verify that the maximum is reached when p 
takes the value 74/120 ~ 0.6167. 


—50- 


—100 + 


—200 7 


=250 } 


MLE = 0.6167 


—300 


ODO þe =- Á l l l M M M M M M Á Á M l l i i l Á Á l Á Á Á --- 


0.0 0.2 0.4 0. 0.8 1.0 


FIGURE 10.2: Log-likelihood function of the Bernoulli distribution 


10.1.2.2 Asymptotic distribution 


Let 6, denote the maximum likelihood estimator obtained with a sample of n observa- 
tions. We can show that 6, is asymptotically normally distributed, unbiased and efficient: 


Jn (ô, = 8o) + N (0, 77! ()) 
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where ĝo is the true value and J (00) is the Fisher information matrix for an observation: 


; E a 


T (80) = — 


00007 


Another useful result is the information matrix property: 


Z(%) = nI (0o) 


os a aa | 
30 00 
m [AL (9) AL (00) 
T | a0. ðo | 
This identity comes from the fact that we have: 
2 [S (8o)] = E [e] =0 
and: 
var (S (80)) = E [(S (bo) — E[S (60)]) (S (80) — E [S (80)])" 
_ p | PL (80) AL (80) 
~ | 00 ə 
= T() 


Remark 121 Let Õ be an unbiased estimator of 09. The Cramer-Rao theorem states that 
the variance of 0 is bounded below by the inverse of the information matriz: 


var (0) > T (8) * 
It follows that the ML estimator is BUE (best unbiased estimator). 


Let h (0) be a function of the parameter vector 0. The invariance property states that 
h (ôn) converges almost surely to h (0o) and we have: 


J~* (90) 


va (&(0) m0) 4 E 


Let us consider again the Bernoulli distribution (Example 103 on page 615). We recall 


that: m ” 
OL (p) = Ppa) Yi Jii (1 = Yi) 
Op p Lg 
It follows that: 5 n n 
Llp) Dia Xia A - wi) 
Op? p? (1 =p)" 
We have: 
Y; 1 (1 -Y; 
T (p) = y Dict ye ( 5 ) 
p (1-p) 
= 2 T 2 
p (1—p) 
ra n 
p(1—p) 
and: a ) 
T(py =n 
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10.1.2.3 Statistical inference 
Estimating the covariance matrix In practice, the covariance matrix of Ê is calculated 


et Aad 
as T (6) = (nz (4)) . Most of the time, the information matrix is however difficult 


to calculate analytically, and we prefer to estimate the covariance matrix as the inverse of 
x Reel 
the opposite of the Hessian matrix var (9) = (-4 (6) ) where: 


(= Fann 


Two other estimators are very popular (Davidson and MacKinnon, 2004). The first one is 
the outer product of gradients (OPG) estimator: 


var (8) = (0 A) 


where J (6) is the Jacobian matrix of the log-likelihood function: 


saj (2201) 


The second estimator is the sandwich estimator: 


3 WER AAT A N1 
var (6) =H (8) 7(6) 7(6) z (ô) 
If the model is well-specified, the three estimators of the covariance matrix are equivalent. 


If it is not the case, it is better to use the sandwich estimator which is more robust to model 
misspecification. 


From the covariance matrix var (9) , we estimate the standard error of 0, by calculating 


the square root of the kt! diagonal element: 


(&) =e), 


We can then test the hypothesis Ho : 6, = Ek by computing the t-statistic: 


= Îr — & 
G) 


Asymptotically, we have t ~ N (0,1). In practice, we assume that t ~ tn-x in the case of 
small samples. 


t 


Example 104 (Modeling LGD with the beta distribution) We consider the follow- 
ing sample Y = (y1,..., Yn) of loss given default: 


{68%, 90%, 22%, 45%, 17%, 25%, 89%, 65%, 75%, 56%, 87%, 92%, 46%} 


We assume that the LGD parameter follows a beta distribution B (a, 8). 
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We have: 
L(a, b) = (a= 1) my + (6-1) din (1— y) - 


nln B (a, 8) 


The first-order conditions are: 


alLla) Blab) OO 
“8 Bog toe 


Oa 


and: 


dB Bag 


aLla, p) nes rind—n)=0 


Therefore, it is not possible to find the analytical expression of â and B. However, we can use 
numerical optimization for optimizing the log-likelihood function and we obtain â = 1.836, 


B = 1.248 and £ (4, 3) = 1.269 The computation of the Hessian matrix gives: 


, Zela, p) aLla, p) 
H(ô) = Cae Be (a, b) ) 


| { -43719 4.9723 
= 4.9723 —10.6314 


We deduce the covariance matrix: 
A „=l 0.4887 0.2286 
var (9) =-H (8) = ( 0.2286 0.2010 ) 
Finally, we obtain the results reported in Table 10.4. 


TABLE 10.4: Results of the maximum likelihood estimation 


Parameter | Estimate pianderd t-statistic p-value 
error 

a 1.8356 0.6990 2.6258 0.0236 

B 1.2478 0.4483 2.7834 0.0178 


Hypothesis testing We now consider the general hypothesis Ho : C (0) = c where C (0) 
is a non-linear function from R* to RY, c is a vector of dimension g and g is the number of 
restrictions. We note @ the unconstrained estimator and 0e the constrained estimator: 


e = argmax £(6) 
st. C(0)=c 


Ho can be tested using Wald, likelihood ratio (LR) and Lagrange multiplier (LM) tests. 
The Wald statistic is defined as: 


pe at O 


Oot 
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Under Ho, the Wald test is: 
W ~ x? (9) 
The second approach is based on the likelihood ratio: 
L (4. | Y) 
L (ô | Y) 


Under Ho, the LR test is: 
~2InA = —2 (¢ (6. | Y) -£ (4 | ¥)) ~x? (g) 
The third approach uses the Lagrange multiplier statistic: 
dL(6.) ~ 3L, 
S h (4.) £ ) 


Under Ho, the distribution of the LM statistic is the chi-squared distribution x? (g). We 
notice that the Wald test uses the unconstrained estimator 0 whereas the LM test uses the 
restricted estimator ĝe. 


10.1.2.4 Some examples 
Multivariate normal distribution We assume that Y; ~ Np (u, 4). We have: 
n 


Ow EH =H) 


l (u, £) = -2 n (27) — n |X| 
2 2 
The first-order condition with respect to p is: 
J, £(u,d) =X E Yi- u) =0 
i=1 
Since X; 4 ETH (Y; — w) = U1 YO (Yi — u), we deduce that fa is the empirical mean: 
n~! ye Ya 


fie Se Yop 


By using the properties of the trace function, the concentrated log-likelihood function be- 
comes: 


tas) = =!) =— mjg- 


np n, ls -T 
= -P n(2)- 2m =p ak 1(¥%,-¥) (%i-¥)") 
np n, 1 = 
ae Ts (27) = 2 n 5X = zt (£ 18) 
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where S is the p x p matrix defined in the following way: 


s-5 r- m-7) 


We deduce the first-order condition: 


Extension of the linear regression By assuming that y; = Ae + u; with u; ~ 
N (0,0?), we obtain: 


£(8,c) = > InPr{Y; = y} 
4=1. 


n i 1 . 1 E 
= J n -ex — i baan. 
= V210; P|- Ti 
n n 2 
n 1 2 1 yi— 2} B 
= a a iaas 5 J ( 


i=1 


In the homoscedastic case o; = o, we can show that the estimators Bua and Bous are the 


same”. Let us now assume that: 


a es 
HO +z Y 


We obtain: 
n n 2 
EE E SyaT Iy unm 4) 
£ (8,0,9) = 5 nan Da ! Ži 7) 32 o? + zly 


This is an example of linear models with heteroscedastic residuals. 


When the model is non-linear y; = g (xi, B) +u; where u; ~ N (0,07), the log-likelihood 
function becomes: 


1 (yi — g (xi, 8)? 
L (8,0) = 5in2 “Ino? ; (y senp) 


{=l 


Some examples of non-linear models are given in Table 10.5. 


10.1.2.5 EM algorithm 


The expectation-maximization (EM) algorithm is an iterative method to find the maxi- 
mun likelihood estimate when the statistical model depends on unobserved latent variables. 


5See Exercise 10.3.4 on page 706. 
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TABLE 10.5: Non-linear models 


Model Function g (x, 8) 
Exponential (gowth) y; = B,e""* + uj 


Exponential (decay) yi = y- + (y+ — y-) e787 + ui 

Hyperbola yi = (Givi) / (B2 + zi) 

Sine Yi = bı + Basin (63x; + Ba) + ui 

Boltzmann yi = y- + (y+ — y-)/ (1 ae eaa) 
Beo E a Sim aed Pa) 


We note Y the sample of observed data and Z the sample of unobservable data. We have: 
£(¥,Z;0) = DMF (yi 758) 
= domes (yi; 0) f (zi | ys) 


= Saus G | yi; 0) 
i=l 


i=l 


= £(Y;0)+ 5 inf (a | y0) 
i=l 


We deduce that: 
£(Y; 0) = £(Y,Z;0) — -Zmz f (a | yi0) 


Dempster et al. (1977) define the expected value of the log likelihood function as follows: 
Q (6:0) = E [e(y,z:6)|0| 


where 9) is the vector of parameters at iteration k. They show that under some conditions 
the sequence of maxima Okt) = arg maxo Q (9; tk )) tends to a global maximum 6am = = 


g(°°) implying that £ (Y; das) > £(Y;6) for all 6 € ©. The EM algorithm consists in 
iteratively applying the two steps: 


(E-Step) we calculate the expected value of the log-likelihood function E [£ (Y, Z;0)| 0] with 
respect to the parameter vector 6°); 


(M-Step) we estimate 0+) by maximizing Q (0;0): 


a) = arg max Q (6; 0%) 


The EM algorithm is used to solve many statistical problems: missing data, grouping, 
censoring and truncation models, finite mixtures, variance components, factor analysis, 
hidden Markov models, switching Markov processes, etc. 


Statistical Inference and Model Estimation 623 


TABLE 10.6: Insulation life in hours at various test temperatures 
Motorette 1 2 3 4 5 


150° 8064* 8064* 8064*  8064* 8064* — 
170° 1764 2772 3444 3542 3780 


190° 408 408 1344 1344 1440 
220° 408 408 504 504 504 
Motorette 6 7 8 9 10 


150° 8064* 8064* 8064* 8064* 8064* 
170° 4860 5196 5448* 5448* 5448* 
190° 1680* 1680* 1680* 1680* 1680* 
220° 528* 528* 528* 528* 528% 


Source: Schmee and Hahn (1979). 
An asterisk * indicates that the test has been stopped without the failure of the motorette, 
implying that the observation is censored. 


Censored data Table 10.6 gives the results of temperature accelerated life tests on elec- 
trical insulation in 40 motorettes. Ten motorettes were tested at each of four temperatures 
(150°, 170°, 190° and 220°). The results are the following: all 10 motorettes at 150° are 
still on test without failure at 8064 hours; 3 motorettes at 170° are still on test without 
failure at 5448 hours; 5 motorettes at 190° are still on test without failure at 1680 hours; 
5 motorettes at 220° are still on test without failure at 528 hours. We assume the following 
model: 
Yi = Bo + Pixi + Ej 


where y; = logy, di, d; is the failure time, x; = 1 000/ (t; + 273.2°), t; is the temperature and 
ci ~ N (0,1). We cannot use the linear regression because some values of y; are censured. Let 
A and B be the sets of non-censored and censored data. The expression of the log-likelihood 
function is then: 


£(0) = 5 nam 5 no? 
1 
552 (= (yi — Bo — Baxi)” + XC (Zi — Bo - siy’) 
~ Ea icB 


where Z; is the failure time of the motorette 7 that we do not have observed. However, we 
know that Z; > ci where c; is the censured failure time. We deduce that: 


n n 1 
£(0) = 5) ln 27 5) Ino? 292 2 (yi = Bo = Bix)” 
iE A 
= E [(Z IAZ 10.12 
202 4 | (Zi — Bo — Bı xi) | i Z Gi ( : ) 
1€B 
where: 
| (Zi — bo — Bixi)” | Zi > ci| = E[Z |Z > c]- 


2 (Bo + Bias) E [Zi | Zi > ci] + 
(Bo + biri)” (10.13) 
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Tanner (1993) showed that: 


è [Zi | Zi > ci] = pi +o 


and: 


i [Z2 | Zi > c] = a? Ho? +o (cit ui) 


where ui = bo + b1x;. The EM algorithm is then: 


(E-Step) we calculate E [Z? | Zi > cil and E[Z; | Z; > ci] using the values B®), (r) and o(*): 


(M-Step) we estimate Bs 


we deduce the conditional expectation (10.13); 


ane) pikty) and o'+)) by maximizing the conditional log-likelihood 


function (10.12). 


Starting from the initial values 6° = 8% = 0 = 1, we obtain 8P = —5.087, 6® = 4.008 
and ø) = 0.619 at the first iteration, 8°? = —6.583, BC) = 4.670 and o®) = 0.515 at 
the second iteration, etc. Finally, the algorithm converges after 33 iterations and the EM 
estimates are Bo = —6.019, By = 4.311 and ô = 0.259. In Table 10.7, we also report the 
value taken by the expected failure time E |Z; | Z; > c;] at the last iteration. 


TABLE 10.7: Expected failure time E[Z; | Z; > c;] obtained with the EM algorithm 


Motorette 1 2 3 4 5 
-150° 17447* 17447* 17447* 17447* 17447" 
170° 1764 2772 3 444 3542 3780 
190° 408 408 1344 1344 1440 
220° 408 408 504 504 504 
Motorette 6 7 8 9 10 
~~ 150° 17447* 17447* 17447* 17447" 17447 — 
170° 4860 5196 8574* 8574* 8574* 
190° 2862* 2862* 2862* 2862* 2862* 
220° 850" 850* 850* 850* 850* 


The censored data represented by an asterisk * are replaced by the value of E[Z; | Z: > ci] 
calculated by the EM algorithm at the last iteration. 


Multivariate Gaussian mixture model The probability density function of the ran- 
dom vector Y of dimension K is defined as a weighted sum of Gaussian distributions: 


f(y) = X Tox (v; uy, Ey) 


j=1 


where m is the number of mixture components, uj and X; are the mean vector and the 
covariance matrix of the Gaussian distribution associated with the jt component, and 7; 
is the mixture weight such that }0"", 2; = 1. The log-likelihood function of the sample 
Y= {¥i,.--, Yn} is: 


£(0) = Soin) mjok (Yi; Hj, Dy) 


i=1 j=l 
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The derivative of £ (0) with respect to u; is equal to: 


DLO) ~ mjor (Vis Hy, D5) 
Op; 


Fai TsK (Yi; Hs, Xs) í 


Therefore, the first-order condition is: 


i=l 


2 mji; (Yi — u;)=0 
where: 

TOK (Yi; Hj, £3) 
4 TsỌK (Yi; Hs, Zs) 
We deduce the expression of the estimator ĝj: 

ù= Dia TjiYi 
T Yai Mi 


For the derivative with respect to ;, we consider the function g (£7 


Tj = 


1 1 Ty-l 
siy = =3(Yi-nu;) Xj (VYi-H5) 
g pyr = ——— e 2 j 
C) = aaa? 
_1)1/2 
eue me 0 —15)7) 


We note: 


y (£7) = exp (-5 trace (277 (Yi =) (Yi — “)")) 


It follows that®: 


dg (27 ') 1E lE; 1 
= 2- 
T 2 (2r) p (27°) 


—1/2 


=E 7 


Hj) 


— hp `. TPP 
(an)*/? |u|“? 


= so (57 ") (z; -= (Yi — uj) (Yi — 1;)") 


We deduce that: 


— s 1 TsK (Yi; Us, Us 


6We use the following results: 

3 |A| =o 
ge ee) 
ð trace (ATB) 

OA 


= B 


=o (145 ys) © Yi = n) (Yi — m) ) 
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(10.14) 


1) defined as follows: 
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The first-order condition is then: 
f 
Yoru (2- (Yi — My) (Yi — nj) )=0 


It follows that the estimator Ê; is equal to: 


S T 
S Dia Tja (Yi — m) (Yi — uy) 
2a Nji 


Regarding the mixture probabilities mj, the first-order condition implies: 


2 OK (Yi; hj, Xj ) =) 
pore L1 THOR (Yi; Hs, Xs) 


(10.15) 


where A is the Lagrange multiplier associated to the constraint D -1 7; = 1. We deduce 
that A = n. We conclude that it is not possible to directly define the estimator 7;. This is 
why we have to use another route to obtain the ML estimator. 


We introduce the estimator ji: 
TOK (Yi; Hj, Bj) 
a TsOK (Yi; Hs, Us) 


7; is the posterior probability of the regime index for the observation i. Knowing 7;,;, the 
estimator 7; is given by: 


(10.16) 


ji = 


eo Sp 3 À 
The EM algorithm consists in the following iterations: 


1. we set k = 0 and initialize the algorithm with starting values r? ) ul u? ) and a 


2. using Equation (10.16), we calculate the posterior probabilities 77; ;: 


bx (Yau? Ei) 


5 ie (Y; us zm) 


3. using Equations (10.14), (10.15) and (10.17), we update the estimators ĉj, fi; and Èj: 


n k 
(k+) Nasi ai) 
n 
n k 
per a i= 1 nt DY; 
j =a n k 
Da 1 rl 7 


Sa ee ay 


J n k 
wae ix : 
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4. we iterate steps 2 and 3 until convergence; 


5. finally, we have ĉj = a, p= ae and È; = a, 

Let us consider the monthly returns of the S&P 500 index from January 2000 to De- 
cember 2015. Using a Gaussian model, we obtain f = 0.49% and ô = 4.90%. If we consider 
a bivariate mixture model, we obtain the following estimates: 


Besime fy _ Mi _ ĉi 
j=1 7201% 140% 3.13% 
j=2 27.99% -1.84% 7.28% 


We have represented the corresponding probability density functions in Figure 10.3. We 
notice that the Gaussian and mixture pdfs are very different, even if they have the same 
mean and variance”. However, the skewness coefficients are very different. For the Gaussian 


distribution, yı is equal to zero, whereas we have yı = —0.745 for the mixture distributions. 
14 p 
— — Regime 1 / \ 
127 === Regine 2 \ 
—— Mixture l 
L Gaussian I \ 


Return (in 7) 


FIGURE 10.3: Probability density function of the monthly returns of the S&P 500 index 


TThe first two moments of the mixture distribution are: 
E [Y] = mi pı + T22 


and: 
var (Y) = Tio? + 1203 + TiTa (u1 — wr 


8The expression of the skewness coefficient is: 


TITZ (m2 — mı) (mı — #2)? +3 (u1 — u2) (o? -= 03)) 


3/2 


nW(Y) = 


(mo? + T202 + 7172 (m = 12)”) 
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10.1.3 Generalized method of moments 


The method of moments is another approach for estimating a statistical model. While 
the objective of the method of maximum likelihood is to maximize the probability of the 
sample data, the method of moments estimates the parameters by fitting the empirical 
moments of the sample data. The choice of one approach rather than another generally 
depends on the computational facilities associated to the probability density function and 
the statistical moments. The method of moments is particularly suitable for some financial 
models that cannot be described by an analytical probability distribution. 


10.1.3.1 Method of moments 


Let Y be a random variable, whose probability distribution F (y;@) depends on some 
parameters 0. We assume that we can calculate the first m statistical moments: 


m; (0) =E [Y7] = f vary) 


We consider a sample Y = {yi,...,Yn} and we note g (0) the m x 1 vector, whose elements 
are equal to: 


gO = ZP (vi-m) 
= (Aya) -m0 


Let K be the dimension of 0. If K is exactly equal to m, the method of moments (MM) 
estimator is defined by: 
(=o 


If K < m, the MM estimator minimizes the quadratic criterion: 


0 = arg min Q (0) 


where: 
Q (8) =9(0)' Wg (0) 


and W is a m x m matrix. 
Example 105 We assume that Y ~ N (1,07). 


We have: 


mı (0) =E[Y] =u 


and: 


mə (0) = E [Y?] = p? + 0? 


It follows that the MM estimator Î = (f, 6) satisfies: 


We deduce that: 
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and: 


a=" -== ow - a? 
i=l 


i=1 
The MM estimators f and 6 correspond to the empirical mean and the empirical standard 
deviation. 


Example 106 We now assume that Y = X -U where X ~ N (u,07), U ~ Uon 
and X 1 U. We want to estimate the parameters u and o for the sample Y = 
{—0.320, —0.262, —0.284, —0.296, 0.636, 0.547, 0.024, 0.483, — 1.045, —0.030}. 


We have: 


[Y] = E[X -U] =E[X]- (UU = 5 


and: 


[Y°] =E [X°] E [07] = (+o) (gtg) =g +H) 


We deduce that the MM estimators fi and G are: 


and: 


Using the sample Y, we obtain f = —0.109 and 6 = 0.836. 


Example 107 (Modeling LGD with the beta distribution) We consider Example 
104 on page 618, but we now want to estimate the parameters of the beta distribution by the 
method of moments. 


If Y ~ B(a, 8), we have: 
a 


a+ 6B 


[Y] = 


and: 


aß 
(a+ 8)? (a+ 6+1) 


Let fitgp and Lep be the empirical mean and standard deviation of the LGD sample. We 
deduce that the MM estimators are: 


var (Y) = 


and: ; 
fitap (1 — finan) 


-2 
OLGD 


B= 


(1 — fired) 


Using our sample, we obtain fitgp = 59.77% and Gpap = 27.02%. Therefore, the MM 
estimates are Qym = 1.371 and Bian = 0.923. We recall that the ML estimates were 
âmL = 1.836 and Bu = 1.248. If we compare the two calibrated probability distributions, 
we observe that their shape is very different (see Figure 10.4). 
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2.0, 
| 


= Maximum likelihood l 


— = Method of moments 


FIGURE 10.4: Calibrated density function of the loss given default 


Let us assume that Y ~ € (A). We have mı (A) = A7} and mə (A) = 2A~?. We deduce 


that: 
I iy 
QA) = w (5n 7 
iw 1\fig, 2), 
2we (5n 7 (Aye 2) t 
ta 2\° 
2 
~(e-8) 
where: 
W= Wi W2 
E w2 W3 
If w2 = w3 = 0, the MM estimator is: 
n 1 
à= Ti = 
int Yi y 


If wı = wo = 0, the MM estimator becomes: 


N n 
A= 4/572 


In the other cases, we have to use a numerical optimization algorithm to find Â. We consider 
the following sample: 0.08, 0.14, 0.00, 0.06, 0.11, 0.22, 0.11, 0.09, 0.02 and 0.26. Using the 
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weighting scheme w = w?, wz = w (1 — w) and w3 = (1 — w)? where w € [0,1], we obtain 
the following MM estimates: 


w 0.00 0.20 0.50 0.70 1.00 
A 10.59 9.99 9.54 9.35 9.17 


10.1.3.2 Extension to the GMM approach 


The classical method of moments assumes that the number m of moments is equal to 
the number K of parameters and the weight matrix W is the identity matrix. Hansen 
(1982) extends this approach in two ways. First, he assumes that g (0) does not necessarily 
correspond to the first m statistical moments, but can also include orthogonal conditions. 
Second, the matrix W is chosen in order to obtain an estimate Ê with the smallest variance. 
Therefore, the generalized method of moments (GMM) is a direct extension of the method 
of moments. 


Statistical inference Like the MM approach, the GMM approach considers m empirical 
centered moments that depend on the parameter vector 0. We note h; ; (9) the j** moment 
condition for the observation i such that E[h;,; (@9)] = 0 where 9 is the true parameter 
vector. We note g (0) the vector whose elements are: 


The GMM estimator is defined as: 
6 = argmin g (0)' Wg (0) 


where W is the weighting symmetric matrix. Hansen (1982) shows that 6 is asymptotically 
normally distributed: 


va (6 — 6) +N (0,V) 


where: i A 
V=(D'WD) D'WSWD(D'WD) 


D is the Jacobian matrix of g (0), and S is the covariance matrix of empirical moments: 


S = limn- E E (80) g (60) ] 


n—->co 


= lm E [h (Ao) h (6o) 


n= o0 


We can also show that the optimal weighting matrix corresponds to the case W = S71. The 
underlying idea is that moments with small variance are more informative than moments 
with high variance. Therefore, they should have a larger weight. We then deduce that 


V = (D7S-1D)™* and var (6) = (nDTS-1D)™*. 


We notice that the quadratic form Q (6) = g (0)! Wg (0) is particular since the weighting 
matrix W = S7! depends on the parameter vector 0. A direct optimization of Q (0) does 
not generally converge. This is why we can use the following iterative algorithm: 


1. let W© be the initial weighting matrix; 
2. at iteration k, we find the optimal value 6): 


6) = arg min g (0)' W%-Yg (0) 
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3. we update the weighting matrix W(*) = S" where: 
gE h (a) (a) "| 


4. we repeat steps 2 and 3 until convergence: a) — gen] <e. 


Remark 122 The two-step GMM procedure of Hansen (1982) consists in setting W® to 
the identity matrix and to stop at the second iteration. In this case, we obtain cum = 02). 


We consider the previous example. We have h; ı (A) = yi — àT} and hia (A) = y? —2A7?. 
At the first iteration, we set W© = Jy and we obtain A® = 9.36. We deduce that: 


g0) Ln (A) h CON =1074. ( "a8 ) 


At the second iteration, the weighting matrix becomes W® = S$", implying that the 


solution is Â® = 11.49. Finally, we obtain \@) = 12.04, A® = 12.13, A® = 12.14 and 
\) = 12.14. The algorithm has then converged after 6 iterations and we have \aqm = 


12.14. We also obtain: 
D= 67.87 
~ \ 22.36 


S= Ln (cum) h (iamm) =10%. ( ei ey ) 


It follows that the standard deviation of ÂGMM is equal to 2.81. 


and 


If m > K, then there are more orthogonality conditions than parameters to estimate. We 


don’t have g (6) = 0, meaning that the model is over-identified. In order to test the over- 


identifying constraints, we consider the J-test. Under the null hypothesis Ho : E [hiy (ô) | = 
0, we have: 

J=nQ (6) = ng (a) Wg (6) — x? (g) 
where g = m — K is the number of over-identifying conditions. In the case of the previous 
example, we have Q (ô) = 0.2336 and J = 2.336. It follows that the p-value is equal to 
12.64%. At the 90% confidence level, we then reject the null hypothesis. 


The method of instrumental variables Let us consider the linear regression: 


K 
yi = >> bkti + u 
k=1 


2 


Since we have E [w;] = 0 and var (u;) = of, we deduce that: 


and: 


K 2 
hi2 (0) = uj — 07 = (o 7 > Aen] -0° (10.19) 
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where the vector of parameters is equal to (8,0). The number of parameters is then equal 
to K +1. If K > 1, we need to find other orthogonal conditions. We recall that the linear 
model assumes that the residuals are orthogonal to exogenous variables. This implies that: 


K 
hio+k (0) = tiik = (v = 5 Baza) Ti,k (10.20) 


k=1 


We obtain a system of K + 2 moments for K + 1 parameters. The model is over-identified 
except if the linear model contains an intercept. In this case, the first and third moments 
are the same because 2;,; = 1, and we obtain: 


ES PTE ( Pors ) 
OML 


If the assumption E [u | X = x] = 0 is not verified - or E[X'U] = 0, we estimate the 
parameter 6 by the method of instrumental variables (IV). The underlying idea is to find a 
set of variables Z such that E [u | Z = z] = 0. Since we have: 


Z 'Y=Z'X6+Z'U 
where Z is the n x K matrix of instrumental variables z;,,, we deduce that: 
Bw = (ZX) ZTY 


If follows that the estimator Bw is unbiased: 


[êv] = E[(@™x) ‘27y| 


and the expression of its variance is: 


var (ôv) = (ZX) '270?I,2(27X) 


o? (ZX) ZZ (ZX) ` 


Let us now consider the GMM estimator 6g) defined by the two moments (10.18) and 
(10.19) and the following orthogonality conditions: 


K 
hi 24k (0) = Uiži,k = (o = 5 Bazsa) Zik (10.21) 
k=1 


We can show that Baw = Br. The method of instrumental variables is then a special case 
of the generalized method of moments’. 


°We also notice that the previous analysis is also valid for non-linear models: 
yi = f (xi, b)+ ui 
We just have to replace u; by the expression y; — f (xi, 8) in Equations (10.18), (10.19) and (10.21). 
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Some examples The general form of a ARCH model is (Engle, 1982): 


where u, ~ N (0,h7) and h} = ao + $$- ajuz_;. The conditional variance is then an 


autoregressive process!?. The first two moments are E [u] = 0 and E |u? — h?] = 0. We can 
also impose the K orthogonality conditions E [u;x:,,] = 0 for all k = 1,..., K. However, 
these different moments are not enough for estimating the parameters a;. This is why we 
have to impose the q orthogonality conditions between the centered innovations u? — h? and 
the lagged residuals u?_;: 


[h)a] =0 
for j = 1,...,q. This estimation method based on GMM has been suggested by Mark 
(1988). 


In the standard linear time series model y; = 2 8 + us, we assume that u, ~ M (0, a°). 
If we also assume that the residuals are autocorrelated, we obtain: 


i [urus] = E[(pur-1 + Et) Ue—1] 
= pE [|u? 1] +E [eru] 
— po 


The GMM estimator consists in using Equations (10.18), (10.19) and (10.20) and adding 
the following orthogonality moment: 


ht, K+3 (0) = UtUt-1 — po” 


where the vector of parameters 6 becomes (2,0, p). 
Let he (0) = (hi1 (A) ,.--,Rt,m (0)) be the m x 1 vector of empirical moments for the 


j AT 
observation date t. We have seen that the estimate of S = E h (9) h (9) | is equal to: 


This implies the assumption that the empirical moments are not autocorrelated. However, 
when dealing with time series, this hypothesis is generally not satisfied and it is better to 
use a heteroscedasticity and autocorrelation consistent (HAC) estimator: 


where: 


and wj is the weight function. Newey and West (1987) showed that the Bartlett kernel 
defined as w; = 1 — j/ (£+ 1) is a simple method that is consistent under fairly general 
conditions. 


10The term ARCH means autoregressive conditional heteroscedasticity. 
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10.1.3.3 Simulated method of moments 


When a model is complex, it may be difficult to find an analytical expression for each 
moment condition. The basic idea behind the simulated method of moments (SMM) is then 
to simulate the model and to replace the theoretical moments by the simulated moments. 
The theory of SMM has been formulated by McFadden (1989), Pakes and Pollard (1989) 
and Duffie and Singleton (1993), and this approach is particularly popular for dynamic asset 
pricing models. 

If we consider the method of moments, we have: 


n 


0; (0) = => (vi -M (0)) 


i=l 


where ñ; (0) is the jt simulated moment computed with ng simulations. For the generalized 
method of moments, we obtain: 


where h;,; (0) is the j** simulated orthogonal condition for the it* observation. The SMM 
estimator is then defined exactly as the GMM estimator: 


6 = argmin g (0)' Wg (0) 


Like the GMM estimator, the SMM estimator 6 is asymptotically normally distributed 
(Duffie and Singleton, 1993): 


vin (4 - 6) >w (o, (+2) v) 


1. . f 3 ; 
is the GMM covariance matrix and ng is the number of simula- 


where V = (D! S- D) 
tions. Therefore, the covariance matrix of 6 depends on the ratio T = n/ng: 
X 1 = 
var (6) = Sa (10.22) 
n 
This implies that the number of simulations ns must be larger than the number of obser- 
vations: 
ns Èn 


Remark 123 The key point when considering the simulated method of moments is to use 
the same random numbers at each iteration of the optimization algorithm in order to ensure 
the convergence of the SMM estimator. 


In order to illustrate the simulated method of moments, we consider Example 106 on 
page 629. The simulated moments are: 


and: 
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TABLE 10.8: Comparison of GMM and SMM estimates 


Method ns Ê ô ôl) G(¢) Gmuc(A) Gmc (ô) 
GMM —0.109 0.836 0.306 0.165 
SMM 25 —0.115 0.880 0.373 0.218 0.214 0.340 
SMM 200 —0.102 0.847 0.315 0.172 0.070 0.062 


where ys) is a simulated value of Y = XU where X ~ N (u, 07), U ~ Up. and X L U. 
We have: 


Y(s) = Ts) ` Us) = (U + ans) + Us) 


where u(s) and ns) are uniform and standard normal random numbers. In Table 10.8, we 
compare the results obtained with the GMM and the SMM approaches. For the simulated 
method of moments, we consider 20000 Monte Carlo replications and report the average 
values. We notice that when the number of simulations is low, the estimator can be biased. 
For example, ô is equal to 0.880 on average when ng is equal to 25, whereas the GMM 
estimate is equal to 0.836. We also notice that the standard errors ô (ji) and ô (ô) of the 
estimated parameters are higher for the SMM estimator than for the GMM estimator be- 
cause of the factor T. However, these results are based on the asymptotic theory. When 
the number of observations is low (n = 10), the approximation of the covariance matrix 
by Equation (10.22) is not valid. For instance, if we calculate the standard errors by cal- 
culating the standard deviation of the MC estimates, we obtain the values Gc (fi) and 
Gmc (Ô) given in the last two columns. When the number of simulations is large and the 
number of observations is small, the asymptotic theory then overestimates the standard 
errors. Indeed, in the case where ng = 200, the MC standard errors are Gc (fi) = 0.070 
and Gyc (6) = 0.062 whereas we obtain 6 (fi) = 0.315 and ô (ô) = 0.172 calculated with 
Equation (10.22). 


In Chapter 5 dedicated to the operational risk, we have seen how to estimate the parame- 
ters 0 of the severity distribution with the method of maximum likelihood or the generalized 
method of moments. We recall that {21,...,27} is the sample of losses collected for a given 
cell of the operational risk matrix. If we assume that the losses are log-normal distributed, 
the orthogonal conditions are: 


il (0) =a, — ertia? 


h 
2 

hi2 (8) = (2: = tthe") E G ~ 1) 

Let us now assume that we do not collect individual losses but aggregated losses. In this 

case, the sample is defined by {(n1, £1), ..., (nT, £r)} where n; is the number of individual 

losses and z; is the aggregated loss for the it observation. Since the individual losses are 

independent, the orthogonal conditions become: 


i,1 (0) = Xi njet tz? 


h 
i 1-2 2 2 
hi 2 (0) = («i — njet taI ) — n;e? to (e — 1) 


We have seen that data collection in operational risk is impacted by truncation, because 
data are recorded only when their amounts are higher than a threshold H. On page 320, we 
were able to calculate the theoretical moments of truncated individual losses. However, it 
is impossible to find the theoretical moments of truncated aggregated losses and the gener- 
alized method of moments cannot be applied. Nevertheless, we can consider the simulated 
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method of moments with the following orthogonal conditions: 


{ hia (0) = zi — Mi (0) : 
hi2 (0) = (ai — Mir (@—))” — Mi,2 (9) 
where: a 
n 1 z 
and: 7 
x 1 z 2 s2 
fia) = Sra Ex oy Sy A u HEC) — Mia ©) 


In the case of the log-normal distribution, each aggregated loss is simulated as follows: 


Ni 


nifee 


j=1 


where us) ~ N (0,1). Let us consider an example with 20 observations: 1404, 1029, 2607, 
2369, 2163, 2730, 4045, 1147, 2319, 2521, 2021, 1528, 1715, 2547, 1039, 1853, 3515, 
1273, 2048 and 2744. Each observation corresponds to an aggregated sum of 5 individual 
losses!!. If individual losses are log-normal distributed, we obtain the following results: 
ÎÂGMM = 5.797, ÔGMM = 0.718, ÎÂSMM = 5.796 and OSMM = 0.722. We notice that the 
SMM estimates are close to the GMM estimates!*. If we now assume that the aggregated 
losses have been collected above the threshold H = 1000, we obtain figmm = 5.763 and 
ôsmmM = 0.745. Therefore, the effect of truncation has changed the estimated parameters. 


10.1.4 Non-parametric estimation 


In the previous paragraphs, we specify a parametric model, that is a statistical model 
which depends on a parameter vector 6. Therefore, the estimation of the model consists in 
estimating the parameter vector 0. Using 6, we can determine some quantity of interest, for 
example probabilities, quantiles and expectations. In the case of non-parametric models, we 
directly estimate the quantity of interest without specifying a probability distribution or a 
statistical model. 


10.1.4.1 Non-parametric density estimation 


Histogram estimator Let X bea random variable with continuous distribution function 
F (x). Using the sample {z1,..., £n}, we can estimate F (x) by the empirical distribution 
function: 


B (0) =~ Soe: <2} 


where F (x) is the percentage of observations that are lower than x. If we now consider the 
estimation of the density f (x) from a sample {1,...,2n}, we have dF (x) = f (x) dx and 


11We have n; = 5. 
12Of course, the SMM estimates depend on the number of simulations and the seed of the random number 
generators. Here, the results have been obtained with 2000 simulations. 
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we deduce that: 


A dF 
fe = =o 
_ Ê(æ+h)-Êf(s-h) 
E 2h 
5 D a h<a,<at+h} 
n «+ 2h 


The density estimator Î (x) is known as the histogram. It counts the percentage of obser- 
vations that belong to the interval [x — h, x + h]. The issue with the histogram estimator is 
that the density function f (£) is not smooth and is sensitive to the bandwidth h. To obtain 
a continuous density, we can specify a parametric density function f (x;6), estimate the 


parameter 0 by maximum likelihood and assume that f (x) =f (2; mL |. By construction, 


the estimator f (a ôn) is continuous, but it is biased because the statistical model f (x; 0) 
is not necessarily the right model. 


Kernel estimator We notice that the previous estimator f (x) can be written as: 


ORE SI (GS) (10.23) 


where K is the uniform density function on [—1, 1]: 


Ku) =5-A{-1<ush 


K is also called a rectangular kernel. The idea is then to replace this function by other 
window functions that are sufficiently smoothed and satisfy some properties (Silverman, 
1986): 


e K(u) > 0 to ensure the positivity of the density; 
e [°K (u) du = 1 to verify that F (oo) = 1. 


Moreover, the symmetry property K (u) = K (—u) is generally added. We can then show 
that: 


and: 


var (i (z)) =f (2) [. K? (u) du 


The bias of f, (2) depends then on the second moment of the kernel p5(K) = 
E u?K (u) du whereas the variance of f, (a) is related to the roughness of the kernel 
R(K) = f° K? (u) du. We also notice that the bias is proportional to the curvature 
f” (x) and the variance is inversely proportional to the number of observations n. The 
most popular kernel functions are the Gaussian kernel — we have K(u) = ¢(u) and 
T (u) = ®(u) — and the Epanechnikov kernel!? — we have K (u) = 3-(1—u?)-1 {|u| < 1} 


13The Epanechnikov kernel is often called the optimal kernel because it minimizes the mean squared error. 
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and Z (u) = min (4 - (3u — u? + 2) - 1 {u > —1}, 1). The difficulty is the choice of the band- 
width h since there is an arbitrage between bias and variance (Jones et al., 1996). For the 
Gaussian kernel, a rule-of-thumb is h = 1.06. ô - n™™ where ô is the standard deviation of 
the sample {z1,..., £n}. 


Remark 124 From Equation (10.23), we deduce that: 


A j 1 ” t— 2; 
P(x) = [mur - ) a 


(10.24) 


II 

slr 
Me 
N 
oN 8 
8 

>| | 

È 
Sv 


where T (u) = f” | K (t) dt is the integrated kernel. 
In the case of a multivariate distribution function, we have: 


zs 1 z Tı — Tli Lm — Umi 
Flere 8m) = Siem DE ( h eee h ) 


where K(u) is now a multidimensional kernel function’ that satisfies K(u) > 0, 
fK(u) du = 1, fuK (u) du = 0 and fuu'K(u) du = Im. Generally, the multidimen- 
sional kernel K (u) is defined as the product of univariate kernels Ķ (u) = Mi Kj (uy) 
where K; is the kernel for the jt? marginal distribution. 
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FIGURE 10.5: Histogram of the weekly returns of the S&P 500 index 


l4We have u = (u1,..., Um). 
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An example In Figure 10.5, we have reported the histogram of weekly returns Rs of the 
S&P 500 index! for different values of h. We verify that the histogram is very sensitive to 
the parameter h. We now consider the estimation of the probability density function of R4. 
For that, we consider 4 statistical models: 


1. the first model assumes that weekly returns are Gaussian; 


2. the second model assumes that weekly returns are distributed according to the Stu- 
dent’s t4 distribution; 


3. the third model is a variant of the second model, but the Student’s t distribution has 
2 degrees of freedom; 


4. finally, we use a Gaussian kernel with h = 1% for the fourth model. 


Results are reported in Figure 10.6. In order to illustrate the model choice on distribution 
tails, we consider the order statistic 1 : 52, meaning that we estimate the density function of 
the worst weekly return over one year. On page 755, we show that the distribution function 
of the random variable Xj.n is equal to Fi:n (x) = 1 — (1 — F (x))”. We deduce that the 
estimated probability density function is equal to: 


fin (2) =n (1-B(a))" F@) 


Results are given in Figure 10.7. We notice that we observe significant differences between 
the four models. Since the non-parametric estimation is less biased, we conclude that the 
Student’s t4 distribution is the more appropriate parametric distribution function. 


Gaussian distribution Student t4 
0.25 0.25 
0.20 0.20 
0.15 0.15 
0.10 0.10 
0.05 0.05 
0.00 0.00 
-9 -6 -3 0 3 6 9 12 -9 -6 -3 0 3 6 9 12 
Return (in %) Return (in %) 
Student tz Gaussian kernel 
0.25 0.25 
0.20 0.20 
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0.10 0.10 
0.05 0.05 
0.00 0.00 
-9 -6 -3 0 3 6 9 12 -9 -6 -53 0 3 6 9 12 
Return (in %) Return (in %) 


FIGURE 10.6: Density estimation of the weekly returns of the S&P 500 index 


15The study period is January 2000 — December 2015. 
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Gaussian distribution Student t4 
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FIGURE 10.7: Density estimation of the worst weekly return over one year 


10.1.4.2 Non-parametric regression 


Nadaraya- Watson regression On page 603, we have shown that the conditional expec- 
tation problem is to find the function m (x) such that: 


y =E[Y | X =a] = m (x) 


In the case where X is a scalar random variable, it follows that: 


m(z) = [ute 2) dy 
fey (x,y) 


R fz (x) 
Jr ufey (2,Y) dy 


fx () 


where fz, is the joint density of (X,Y), fz is the density of X and fy), is the conditional 
density of Y given that X = x. We deduce that an estimator of m (x) is: 


dy 


Ale) = tay Posy (x,y) dy 
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because we have: 


a 1 £ T- zti yY— yi 
x , d = “45 x , d 
f obate) y TOALN ; ) y 


and: 


Finally, we deduce that r(x) is a weighted sum of the sample data {y1,..., Yn} of Y: 


n 
Mm (a) = Segre 
a 2 


where the weights are the kernel values applied to the sample data {x1,..., £n} of X: 


T — Ti 


Local polynomial regression We consider the following least squares problem: 


bo (£) = argmin X` Ks (z =) (yi — Bo)” 


BoER Gay 


The first-order condition is: 


Yoke (==) (yi — Bo) = 0 


It follows that Êo (x) = ñ (a). Therefore, the Nadaraya-Watson regression is a weighted 
regression with a constant: 
Yi = Bo + ui 
L— fi 
h 
also on the value x and we use the notation ĝo (x). 


where w; (x) = Kz ( ). Since the weights w; (x) depend on the value x, fọ depends 


We can extend the local constant model to the local polynomial model: 
P . 
Yi = fo + >> 8B; (ai — 2)? + Ui 
j=l 


The least squares problem becomes: 


8 (2) = argmin >, (>) yi — Bo X 8; (wi — 2) 
i=1 
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We know that the WLS estimate is: 
B(x) = (K'WX) XTY 


where W is a diagonal matrix with W; = w; (x), Y is the column vector (yi,...,Yn) and 
X is the design matrix: 


1 (a, -—2) (zı-— 2) (zı — a 
g 1 (a2-—2) (z2-— 2) (xq — x) 
1 (%,—2) (£n x) (En — x)? 


Moreover, we have: 


II 


Pp 
[Y |X =q] Bot >. B(X- r) +U|X=a 
j=l 


= fo 


We deduce that the conditional expectation is again equal to the intercept of the weighted 
polynomial regression: 


Application to quantile regression On page 613, we have seen that the quantile re- 
gression: 
Pr{Y < qa (x) |X =zr}=a 


can be formulated as a M-estimator: 
ĝa (x) = Bo + Ba 
where: 


(bo, 3) =argmin $. p (yi — Bo — Baa) 
$= 


and p (u) = Xa (u) = u - (a — 1 {u < 0}). Let us consider the local polynomial regression: 


n Ss p S 
8 (w) = argmin Y` Ka (: =) p ti = Bo — By (i — 2) 


i=1 j=l 


We deduce that: i 
Ga (x) = Bo (x) 
To estimate the conditional quantile ĝa (x), we can then use a classical quantile regression 


procedure by considering the regressor values (x; — x)’ for j = 1,...,p and the weighting 
matrix based on the kernel function. 


Examples We consider the additive noise model: 
y = sin (9x) +u (10.25) 


where u ~ Uj—o.s,0.5]}- The conditional expectation function is then m (x) = sin (9x) whereas 
the conditional quantile function is qa (£) = sin (9x) +a — 5. We simulate 1 000 realizations 
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of Model (10.25) and estimate the functions m (x) and qq (x). In Figure 10.8, we have 
reported the estimated functions ñ (x) and ĝa (x) for different models. We notice that the 
Nadaraya-Watson estimator is less efficient than the local quadratic estimator. In the case 
of the quantile regression, we find that the local linear regression gives correct results when 
the function is quasi-affine. When the second derivative is high enough, results are much 
more questionable, and local quadratic regression seems more suitable. We obtain a similar 
conclusion with the multiplicative noise model: 


y = (cos (27a — 7) + 1)u (10.26) 


where u ~ Ujo,1;. We have m (x) = 4 (cos (2a — m) + 1) and qq (x) = a (cos (2ra — r) + 1). 
In Figure 10.9, we report the estimates of the quantile regression based on 1 000 simulations. 
We notice that the local quadratic quantile regression may also present some bias. This is 
due to the small size of the sample (n = 1000) compared to the quantile level (a = 90%). 
Indeed, non-parametric quantile regression may need a big sample size to converge. 


Conditional expectation Conditional expectation 
Nadaraya—Watson Local quadratic model 


xi xj 
Conditional quantile (a = 907) Conditional quantile (a = 907) 
Local linear model Local quadratic model 


xj Xi 


FIGURE 10.8: Non-parametric regression of the additive model 


10.2 Time series modeling 
10.2.1 ARMA process 
10.2.1.1 The VAR(1) process 


Let y: be a n-dimensional process and €; ~ N (0, £). We define the vector autoregression 
model (VAR) of order one as follows: 


Ye = H + Oyi + Et (10.27) 
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FIGURE 10.9: Non-parametric regression of the multiplicative model 


The VaR(1) model is very interesting from a computational viewpoint because matrix cal- 
culus is simple. Moreover, the computations done in this paragraph are straightforward to 
extend to more complex processes like VARMA or state space models. 


We say that the process is stable if the eigenvalues of ® have modulus less than one or the 
characteristic polynomial of |[,, — ®z| has its roots outside the unit circle. Under the stability 
assumption!®, the process has an infinite vector moving average (VMA) representation!”: 


Ut = (In — 6) + H+ 5 Pieri 
i=0 


To compute the covariance matrix I’, (k) = 
Ty (0) by using the relationship: 


vec (T, (0)) = (In2 — ® @ ®)* vec (£) 


Then, we calculate T; (k) by recursion: 


i [VY r] = 


Ty(k) = E [yu] - 


© [ye] E [y] 


= GE [miy k] 
= T, (k-1) 


o [uyi r + Pu-iy r 


[yt] 


y [yL all we first calculate 


H EY k] — UH 


16We can show that the stability assumption implies the stationarity assumption (Liitkepohl, 2005). 


17We have: 
lim &* = ( 


i— oo 


In — ®)71 
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Impulse response analysis describes the dynamic evolution of the system after a shock in 
one variable. For instance, the responses to forecast errors are “the effect of an innovation of 
one variable to another variable” (Liitkepohl, 2005). The matrix of responses after k periods 


is A, = ®* and we note Y, = DA A; the matrix of cumulated responses. Sometimes, it 
is better to consider normalized innovations than forecast errors. In this case, we define the 
responses to orthogonal impulses by A; = ®* P (X) and the matrix of cumulated responses 
by =, = ey Ax where P (X) is the Cholesky decomposition matrix of X. 


10.2.1.2 Extension to ARMA models 
A vector ARMA(p,q) process is defined by: 


Pp q 
ye — X Gigi = btr + Er — Y Oiri (10.28) 


i=] i=l 


where y; is a process of dimension n and ep ~ N (0,%). x; is a K x 1 vector of exogenous 
variables. The dimension of matrices ®; and O; is nxn whereas ĝ is a matrix with dimension 
n x K. The parameters p and q are the orders of the autoregressive and moving average 
polynomials. 


Remark 125 In the econometric literature, the process (10.28) is known as the VARMAX 
model because it is a vector process and it contains exogenous variables. If the order q is 
equal to zero — or O; = 0, then we obtain a VARX model. A VMAX model corresponds to 
the case p = 0 or ®; = 0. When there is no exogenous variable except a constant, we use 
the terms VARMA, VAR and VMA. The terms ARMAX, ARMA, AR and MA are often 


reserved for the one-dimensional case n = 1. 
Let us consider the case where exogenous variables reduce to a constant. We have: 
In- > OL | y= ut | In- > OL | e (10.29) 
i=1 i=1 


where L is the lag operator. We can write this process as a VAR(1) model: 


a = c + Tori tu (10.30) 
where ay = (Yt, -- - , Yt-p+1; Et; -- - , Et—q+1) is a process of dimension n (p + q). The residuals 
ut are equal to Re, where the matrix R is equal to ( LO.. O In O >> O J 
and has the dimension n (p+ q) x n. The vector c is equal to ( O- 0 Jl. The 
dimension of the matrix T is n (p + q) x n (p + q) and we have: 

D o 1 p Or Oyi Oy 
Íz (0) (0) (0) 
, E 0 
T= 0 >- In (0) (0) 
0 0 0 0o o0 gat 0 
0 TI, 0 (0) 
(0) 
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We also notice that y, = Za; where the matrix Z = ( LO- O ) has the dimension 
n x n(p+q). Using the VAR(1) representation, we deduce easily the expression of autoco- 
variance matrices, the responses to forecast errors or the responses to orthogonal impulses 
of Qt and Yt- 


We can also write the previous VARMA process as a state space model: 


Yt = Zat 
{ at = Tari +et+ Re: (10.31) 


This representation is very useful since the analysis of state space models also applies to 
VARMA process. For instance, it is standard to estimate the parameters of VARMA models 
by using the Kalman filter, which is presented below. For a VAR(p) process, it is better to 
use closed-form formulas. The VAR(p) process is defined as follows: 


Ye = H+ Biyi +... + Opyr—p + Et 


Using the notations Y = (y © yr), B = (n ® +: ® ), X& = 
(1, Yt, oa ,Yt-p+1) and X = ( Xo Xı sei Xr J Liitkepohl (2005) showed that: 


B=YX™(xXxT)” 


and: 


10.2.2 State space models 
10.2.2.1 Specification and estimation of state space models 


A state space model (SSM) includes a measurement equation and a transition equation. 
In the measurement equation, we define the relationship between an observable system and 
state variables, whereas the transition equation describes the dynamics of state variables. 


Generally, the state vector a; is generated by a Markov linear process!®: 


at = Tiari + ce + Rem (10.32) 


where a; isa m x 1 vector, T; is am x m matrix, ¢ is am x 1 vector and R; is a m x p 
matrix. In the case of a linear SSM, the measurement equation is given by: 


Yt = Zat + de + ét (10.33) 


where y; is a n-dimensional time series, Z; is a n x m matrix, d is a n x 1 vector. We also 
assume that 7, and e; are two independent white noise processes of dimension p and n with 
covariance matrices Q; and H;. 


Kalman filtering In the state space model, the variable ys is observable, but it is generally 
not the case of the state vector œr. The Kalman filter is a statistical tool to estimate the 
distribution function of a. Let ag ~ N (Go, Po) the initial position of the state vector. We 
note Âs (or â+) and Ĝt—1 the optimal estimators of a; given the available information 
until time t and t — 1: 


Alt = i [az | Fi] 


Gee-1 = Eloy | Fi-a] 


18The presentation is based on the book of Harvey (1990). 
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Pye (or Py) and Pijt—1 are the covariance matrices associated to Gy, and Ĝtt—1: 


«| es x T 
Pre = 6 [ (au E at) (Gate = at) ] 
wa R T 
Pit I = a [ (au tS a) (âtt-1 a a) 
These different quantities are calculated thanks to the Kalman filter, which consists in a 


recursive algorithm!’ (Harvey, 1990): 


Ayjte—1 = TeQ4—1\t-1 + Ct 

Pye1 = TePe-aje1T;’ + RiQeR} 
Îtjt-1 = ZeQeje-1 + dt 

Vt = Yt — Îtjt—1 
F; = ZP- Z + Hi 

Qe} = ÂĜtt—-1 + Pre-1Z, Fy ut 

Pije = (Im — Pri 2 Fe * Zt) Pat 


where 4-1 = E[y | Fe—1] is the best estimator of y; given the available information until 
time t — 1, v, is the innovation process and F, is the associated covariance matrix. 


Remark 126 Harvey (1990) showed that we can directly calculate Q441\¢ from Ĝtt—1: 
Qesije = (Tipi — Ke Zt) Qaye1 + Kiye + Crp — Kidi 


where K; = Ti PieaZ) Fot is the gain matrix. It follows that: 


Opie = Try Qeje—-1 + cena + Ke (yt — ZeGeje-1 — de) 
By recognizing the innovation process v,, we obtain the following innovation representation: 
{ Yt = ZeQeje-1 + de + Ut 


tyit = Ter Qee—1 + Ceti + Kiv 


Kalman smoothing Let t* < T be a date before the final observation of the sample. We 
note: 


Arte* = [ar | Fe] 


and: 


Pije =E [ (au: = at) (ije = a) ] 


for all t < t*. By construction, @;«|,» and P,«|;« are exactly equal to the quantities calculated 
with the Kalman filter. We can show that the smoothed estimates for t < t* are given by 
the Kalman smoother algorithm: 


S = Pelai Pa 


t+1|t 
Atte» = Atle T S Qt+ijt* — At+1j|t 
Pije» = Pre +8 Pease = Praae S 


While the Kalman filter is a forward algorithm??, the Kalman smoother is a backward 
algorithm??. 


19The algorithm is initialized with values âojo = ĉo and Pojo = Po. 

20The algorithm proceeds recursively, starting with values âo and Pp at the starting date tg, and moving 
forward in time toward the ending date. 

21The algorithm proceeds recursively, starting with values Ĝtrje» and P;x|;+ at the ending date t*, and 
moving backward in time toward the initial date. 
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Estimation of unknown parameters In many cases, the state space model depends on 
certain parameters that are unknown. Given a set 0 of values for these unknown parameters, 
the Kalman filter may be applied to estimate the state vector a;. We have: 


Vg ~ N (0, Fi) 


where v: = Yt — Îit—1 is the innovation at time t and F; = ZiPut1Z) + H; is the covariance 
matrix. If we change 0 and we run the Kalman filter, we will obtain other values of v; and 
F,, meaning that v; and F; depend on 6. This is why we can write v (0) and F; (0). We 
deduce that the likelihood function of the sample {y,,...,yr} is equal to: 


nT 1 a 1 
£(9) = -T In (27) )- sr (ma )| + vs (0)" F; (0)7 v (0)) 


We can then estimate the vector 0 of unknown parameters by the method of maximum 
likelihood: 


0 = arg max £ (0) 


Once the ML estimate Ê is found, we can run again?? the Kalman filter to estimate the 
other quantities Qe|t—15 Ault Pyt-1 and Pije- 


Time-invariant state space model We consider the time-invariant model: 


{ y= Zax t+d+e 


10.34 
at = Tat- + c+ Ry ( ) 


where & ~ N (0, H) and m ~ N (0, Q). If the state space model converges to a steady 
state, the estimators (@.., Poo) must satisfy the following equations: 


Aso = Tâ + 
Pœ = TPT! + RQR! 


It follows that the solution is: 


{ â% = (Im - T)™t c 


vec (P35) = (Im2 — T Q T) ~} vec (RQRT) (10.35) 


where @ and P, are the unconditional mean and covariance matrix of a;. Without any 
knowledge of the initial position ag, the best way to define @ and Py is then to use the 


steady state: 
ao = Ago 
Po = Po 


In many state space models, the matrices T, c, R and Q depend on unknown parameters 
0, implying that @. and Px, also depend on 0. This means that when maximizing the 
log-likelihood function, the Kalman filter is initialized by values of @ and Po that depend 
on @. This is the main difference with time-varying state space model since the Kalman 
filter is initialized by fixed values of @ and Pp. 


We consider the AR(1) process yp = p + d1y-1 + E: where e, ~ N (0,02). The state 
space form is: 


Zat 
Qt = Tari +ce+ Rn 


22We say again, because computing the log-likelihood function requires one Kalman filter run, implying 
that many Kalman filter runs are used for maximizing the log-likelihood function. 
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where a, = (yn €), Z = ( 0 1 ) T= ( A hes ( A pasii ) and Q= o2. 
It follows that: 


â% = (k-T)tec i 
s 
~ a(t) 


and: 
vec (P) = (Is—T@T) ‘vec(RQR") 
1-g 000\ '/e 
= 0 10 0 o? 
= 0 010 o? 
0 001 o? 
(1-0) o2 
Oz 
Oz 


and: 


More generally, for an ARMA(p,q) process, we calculate @ and Po by using Equation 
(10.35) with the SSM form given on page 647. 

Since the SSM (10.34) can be viewed as a VaR(1) process, we can easily calculate the 
autocovariance matrices T, (k) = cov (yt, Y+-ķ): We have T, (k) = ZT*T, (0) Z' where 
T, (0) = Pæ. We also deduce that the responses to forecast errors and orthogonal impulses 
are equal to Ay, = ZT" R and A; = ZT* RP (Q) where P (Q) is the Cholesky decomposition 
matrix of Q. It follows that the long-term multipliers are Y% = Z(I—T)~* R and Ey = 
Z(I—T)~* RP(Q). 


10.2.2.2 Some applications 


The recursive least squares We consider the linear model y = x, B + uz where z 
is a vector of K exogenous variables and u,~ M (0, o°). This model corresponds to the 
following state space model: 
{ ye = 1i Bp + ut 
bt = Br-1 


where 3; is the state vector. Using the Kalman filter, we obtain Bute l= Ê; it-1 Pijt-1 = 


Fisip Îtjt—1 = zi Beje—1, Ve = Ye — Delt and Fy = xl Prye-12t + o°. The updating 
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equations are: 


{ Bate E Brena T Pit-itiF we 
Pai = (Ik — Pr -ieeFy e ) Pee—1 


We consider the t x 1 vector Y, = (y1,..-,Yy+) and the t x K matrix X; = (æ},... xf). 
We have XX; = Xl 1X1 + aa) and X; Y; = X, 1Yi—1 + aim. If we assume that 
Pig- = 0? (Kia 3 we deduce that: 


F,=0? (1 eal (Cea ze) 
It follows that?®: 


Pe = (Ik- Pre 1teFy 24 ) Pit-1 


o? (XL 1X1) ee (X 1X1) 0? 


= (XL, X1) 
( -o ) o2 (1+a7 (XiXe) ze) 


salo Xa) aS) 
t 


where A; = (XZ 1X1)” 


Since we have: 


x4. This proves that the assumption P, = o? (X; x is true. 


Ut 
p o 
LHA T 


the Kalman filter reduces to the following set of equations: 


r r =i 
Bie = Buea + (Xo Xe-1) 


Ut = Yt — ae Bet 

M= (X La X) he 

F; = o? (1 + Af 22) 

Ês = Bi + Piit Fy u 

P; = P1- (1 oe Mat) MAL 


where 3, = Bere and P; = Pie. These equations define exactly the system of the recursive 
least squares (Spanos, 1986) and avoid the inversion of matrices X,' X, to compute the RLS 
estimates 3, = (XZ X4) XTY: fort =1,...,T. 

At the terminal date, we have: 


Br = Bryr = Boxs 
Since we have S = Pil Pi t= PP ag ; = Ix, the Kalman smoother gives: 


Bur = Bute t (Bear = Beste) = Besar 


and: 
Par = Pie + (Piar = Pizie) = Par 


23 The last identity is obtained thanks to the Sherman-Morrison-Woodbury formula. We have: 


AtA; 
1+ AL Xt 


(XTX) = (ea) 
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We conclude that Bur = Bots = (XP Xp) XEYr and Par = cov (ors) = 


o? (Xp Xr). These results are easy to understand because the best estimator given all 
the information is the OLS estimator applied to the full sample. 


In order to illustrate the benefits of recursive least squares, we simulate the following 
model: 


= f Q4+t+3a,+um ift < 100 
Yt =) 104+t+3a,+% ift>100 


where e ~ N (0,1). The exogenous variable x, is simulated from the distribution function 
U(o,5]- In Figure 10.10, we report the RLS estimated values of the constant c, of the linear 
model y= ct + 6424+ uz. We observe a behavioral change of the estimate when t > 100. If 
we consider the evolution of the innovation process v; and the corresponding 99% confidence 
interval, we verify a structural change in the model. We generally identify trend breaks by 
using CUSUM and CUSUMSQ tests (Brown et al., 1975). Let w; = (1+ Atay vu, be the 
normalized innovation. Under the assumption Ho : 6; = 64-1, the CUSUM statistic defined 


EE 
by W, = s7! Eai wi where s2 = (T — K)7' D (u — xt Br) follows the distribution 
function M (0,t — K). The CUSUMSQ statistic corresponds to V; = Ji; w? I EL w 
and has a beta distribution 8 ((T — t) /2, (t — K) /2) under Ho. 


0 50 100 150 200 0 50 100 150 200 


Na, 


= 
arenans? 


r aa 


0 50 100 150 200 


FIGURE 10.10: CUMSUM test and recursive least squares 


Structural time series models With the Kalman filter, we can estimate unobserved 
components. Let us consider the deterministic trend model: 


Yt = be + Et 
fy = Bt 
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where €, ~ N (0, o2). Estimating the trend u is equivalent to estimate the parameter 8 by 
ordinary least squares and set fi, = B -t. We notice that the previous model can be written 
as: 
{ Yt = bt + Et 
He = Mt-1 + 8 


A way to introduce a stochastic trend is to add a noise m in the trend equation: 


Ye = be + Et 
Ut = fu-1 + B+ | 


where m ~ N (0,02). This model is called the local level (LL) model. Using the SSM 
notations, we have the following correspondence: Z; = 1, de = 0, Hi = 02, a4 = pt, Tr = 1, 
qc = B and Qi = Oa: Once the parameters 8, os and o,, are estimated by the method of 
maximum likelihood, we estimate ĝi and ÎÂit—1 by using the Kalman filter. Let us now 


assume that the slope of the trend is also stochastic: 


{ Ut = Me-1 + Pr-1 + | 
Be = Brat 


where & ~ N Q o2). We obtain the local linear trend (LLT) model. The corresponding 


SSM form is: 
Buou e 
e PR 2) (2) 


where H, = o? and Q; = diag (02,02). With the Kalman filter, we both estimate the 


E 
stochastic trend u, and the stochastic slope 8;. We now consider that two ARMA(1,1) time 
series y1, and yo have a common component cz: 


YLES Q1Y1,t—1 + b1Ct—1 + Elt — O1€1,4-1 
Y2.t = b2Y2,t-1 + Boce_a + E2,t — O2E€2 4—1 
Ct = Ct—1 + E3,t 


We obtain the following state space model: 


Yi,t 
Yit 10000 Y2t 
ye) \01000 a 
El,t 
E2,t 
where: 
Yi,t gd 0 6 -0 0 Y1,jt-1 
Y2,t 0 ġġ b2 0 —02 Y2,t-1 
Ct = 0 0 1 0 0 Ct-1 + 
E1,t 0 0 0 0 E1,t-1 
E2,t 0 0 0 0 E2,t—1 


Oe: oS B o E a 
ee a a a 
oOoOro°eo 
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We notice that the state vector contains two observed variables (yı and y2) and three 
unobserved variables (c;, €1,, and €9,4). 


Remark 127 These different models have been popularized by Harvey (1990) and are 
known as structural time series model (STSM). The reader will find additional materials 
on these models on page 679. 


Time-varying parameters If we assume that the beta coefficients are time-varying, the 
linear regression model has the following SSM form: 


YE = x} Be + Ut 
Be = Pri + h 


where uz ~ N (0, a2) and m ~ N (0,=). When the initial position is By ~ NV Gs Py), the 
Kalman filter equations are Butt ı= Ê it-1; Pea = Pi-ij-1 +È, Vt = Yy ae Êit—1, 
Pijt—1 


F; = x} Pyt-12t + 02, Butt = Bett + ( F, zv and Pie = (Im E (5) ne} ) Pijt—1- 
Let us now multiply the parameters 07, © and P) by the scalar A. The Kalman filter becomes: 
Bae 1 Bi 1t—1? Prue 1= R it-1 T AX, v = y zi Pie- F; = zi Phy st + rou, 
Bae 54 Biga + (=) Tvp Ph, = (B = (>) ne} ) Pij,-1- We obtain the following 
matching: Êi = Piit; Pra = APyt-1, v = ve, Fy = AM, Bue = Butt and Pitt = AP ye. 
This means that the scaling parameter À has no impact on the state estimates 6,,;-; and 
Bates 

In this model, the parameters c2 and © are unknown and can be estimated by the 
method of maximum likelihood. The log-likelihood function is equal to: 


e o S (m+ 
=y a nT p 


t=1 t 


where 0 = (o?, x). Maximizing the log-likelihood function requires specifying the initial 
conditions fọ and P, which are not necessarily known. It is a bad idea to consider that 
6o and Po are also unknown parameters and to perform the maximization with respect to 
0 = (0?, x, Bo, Py). Generally, this approach does not converge. It is better to try different 
initial conditions and to test their impact on the ML estimates. 


Let us consider an illustration provided by Roncalli and Weisang (2009). They suppose 
that a fund manager allocates between the MSCI USA index and the MSCI EMU index. 
The monthly performance R; of his portfolio is equal to: 


fice ge) Ren + (1 = uf) ao 


where RITSA and RED are the monthly returns of MSCI USA and EMU indices, and 


wlV8a) and wP) =1- wlUSa) are the corresponding monthly allocations. In Figure 


10.11, we report an example of the dynamic allocation and the cumulative performance of 


the portfolio. The investor generally does not know the allocation process (wf, w EMY) 


since he only observes the performance of the fund and the different components. Estimating 
the implied exposures is known as a tracking problem. In our example, we can specify the 
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following state space model: 


) | plac 


USA) (USA) 
(EMU) = (EMU) +h 
t t—1 


where u, ~ N (0,02) and m ~N (0, £). Since we have simulated the allocation model in 


Figure 10.11, we can estimate the parameters c2 and X. By construction, 6? is equal to zero 


a, i ; ; USA USA EMU EMU 
and » is the empirical covariance matrix?4 between wi wl! 1 ) and wi = w ) 


gSA 
R= ( ROSA) REM) Fu 


In the first panel in Figure 10.12, we have reported the allocation USA estimated by the 
Kalman filter?””. Generally, © is specified as a diagonal matrix implying that the parameter 
changes are independent: 


© [(85,4 — 854-1) (bk, — Bez—1)] = 0 ifj Ak 


This is the standard approach when specifying a time-varying parameter model. In this 
case, we obtain the second panel. The results are less precise, because we don’t take into 
account the negative correlation between wlS) = ws) and ween) = wee) . In the 
third panel, we estimate 0? and © by the method of maximum likelihood. Again, the results 
are less robust because we specify a diagonal matrix for X. In the last panel, we consider a 


variant of the previous model: 


l R, — REMU) _ ae _ Be) ae ay 


USA USA 
: = po ra Nt 


The idea is to transform the set of correlated exogenous variables into a set of quasi- 
independent exogenous variables. This technique is frequently used in financial tracking 
problems. In this case, the results are very good (see the fourth panel in Figure 10.12). 


10.2.3 Cointegration and error correction models 
10.2.3.1 Nonstationarity and spurious regression 


Let us consider a random walk process: 
Yt = Yt-1 + Et 


where e, ~ N (0,02). y is a nonstationary AR(1) process. Since the lag polynomial of y 
is equal to A (L) = 1 — L, we deduce that the root associated to the equation A(z) = 0 is 
equal to one, and we say that y has a unit root. We also note y+ ~ J (1). 


Let us now consider two time series (x+, y4) generated by the following bivariate process: 


Ti = Li + M 
Yt = Yt-1 + Et 


24We obtain: 


«(57.9848 —57.9848 = 
am ( —57.9848 57.9848 ) ea) 


25We assume that the initial conditions are ĝo = (50%, 50%) and Po = 0. 
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FIGURE 10.11: The tracking problem 
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FIGURE 10.12: Estimation of the dynamic allocation by Kalman filtering 
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where €, ~ N (0,02) and m ~ N (0,07) are two independent white noise processes. We are 
interested in the linear regression: 


Yt = Bo + Pity + ut 


By construction, we know that (6, is equal to zero because x, and y, are two independent 
processes. We can also consider the alternative linear model: 


Yt — Yer = Bo + bı (£t — Li—1) + u 


In Figure 10.13, we have reported the probability density function? of the OLS estimator 
By. We notice that the linear regression using the level or the first-difference does not lead 
to the same results. For instance, there is a 20% probability that By exceeds 0.5 when 
using variables in levels. In these cases, we can conclude that y is related to x, but this 
relationship is spurious (Granger and Newbold, 1974). More generally, we can have the 
impression that there is a strong relationship between x, and y+ because they both exhibit 
a trend even if they are not correlated. The spurious regression phenomenon particularly 
occurs when the processes are not stationary. 


4.07 


===: Level regression 


— Difference regression 


3.0 


0.0 


FIGURE 10.13: Probability density function of Êi in the case of a spurious regression 


10.2.3.2 The concept of cointegration 


Nonstationary processes are not limited to the random walk without drift. For example, 
the process can incorporate a deterministic or a stochastic trend. If we consider the previous 
process yz = yz-1+€+¢, the first difference y,—y,_1 is stationary and we will say that yt—Yt—1 


26We simulate (z+, y¢) for t = 1,...,100 by assuming that xo = yo = 100, and o2 = on = 1. For each 
simulation, we compute the OLS estimate for the two models. We run 5000 Monte Carlo replications and 
estimate the probability density function of 6; by using the non-parametric kernel approach. 
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is integrated of order zero: y:— yz-1 ~ I (0). More generally, a process y; is called integrated 
of order one if y; — y+—1 is stationary. Therefore, we have the following property: 


ye ~ I (1) > ye — w-1 ~ I (0) 
We can extend the concept of integration to an order d > 1: 
y ~ I (d) > (1 — L) y ~ I (0) 


For instance, y+ is integrated of order two if y — y+—ı is integrated of order one and 
(yt — Yyt—1) — (Ye-1 — Yt-2) is integrated of order zero. 


A n-dimensional time series y; = (Y1,t, - - - , Yn,t) is said to be cointegrated of order (d1, d2) 
and we note y; ~ CI (d1, d2) if each component is T (d1) and there are linear combinations?” 
B'yi that are integrated of an order dọ where dọ < dı. In what follows, we restrict the 
analysis to dı = 1 and dz = 0, which is the most frequent case in econometrics: 


(Eio vrara 


From an economic point of view, this implies that 6'y, forms a long-run equilibrium: the 
variables y, are nonstationary, but are related because a linear combination is stationary. 
An example is given in Figure 10.14. The three variables y; 4, yo and yz, are integrated of 
order one and exhibit some trends. However, the linear combination z = 2y1,t — Y2,t — Y3,t 
is stationary and moves around its mean, which is equal to 10. Therefore, it is certainly 
difficult to predict the three univariate time series, because they are nonstationary. It is 
easier to forecast the combination z+, because it is stationary and returns to its mean in the 
long-run. 

Let us consider the bivariate process (x+, y+) where x; ~ I (1) and y+ ~ I (1). The linear 
regression yz = Bo + bızı + uz implies that z = y+ — Bix, = Bo + uz can be integrated 
of order zero or one. If z ~ I(1), there is no long-run relationship between x; and y 
because z; is nonstationary. The knowledge of the process x; does not help to understand 
where the process y+ is located. On the contrary, if z; ~ I (0), there is a long-run stationary 
relationship between x; and y+. On the long-run, shocks to x; and y; have permanent effects, 
but z; can only deviate far from its mean in the short-run. z+ is also called the equilibrium 
error. In economics, a famous example is the theory of the purchasing power parity (PPP). 
According to the law of one price, commodity prices must be the same in two different 
countries: 

P, = SP* 


where P; and P* are the price of commodity 7 in the home and foreign economy, and S 
is the nominal exchange rate. Purchasing power parity is the application of the law of one 
price to a large basket of goods: 

P, t = SP hi 
where P, and P* are the domestic and foreign prices of the basket, and S; is the nominal 
exchange rate at time t. By taking the logarithm, we obtain: 


Pt = St + DE 


where p; = ln P,, py = ln Př and s; = ln S;. In practice, market frictions, transportation 
costs, taxes, etc. explain that short-run deviations from PPP are large, but exchange rates 
tend toward PPP in the long run. To test this theory, econometricians demonstrate that 
pie ~ I (1), se~ I (1), and py ~ I (1), but (pz, st, pž) is cointegrated. 


27We have 8 Æ 0. 
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FIGURE 10.14: Illustration of the cointegration 


Remark 128 Ify, has n > 2 components, we can have several cointegration relationships. 
If there are r independent cointegrated vectors with r < n — 1, y¢ is said to be cointegrated 
of order r. B is then a matrix of dimension n x r, whose columns form a basis for the space 
of cointegrating vectors. We consider the following process: 


Yi = Yat — 0.5 ` Y3, + E14 
Y2, = 0.4 - Y3, + E2,t 
Y3,t = Y3,t—1 + €3,t 


where £1 t, E2, and €34 are three independent white noise processes. It is obvious that yı t ~ 
I (1), yor ~ I (1), and y3 ~ I (1). Moreover, we have: 


Zt = Y1,t — Y2,t + 0.5 : y3, ~ T (0) 


and: 
22,t = Y2,t — 0.4 - Y¥3,t ~ I (0) 


The rank r is then equal to 2 and we have: 


1 0 
B= | -1 1 
0.5 —0.4 


10.2.3.3 Error correction model 


We consider the bivariate example: 


Ye = G1 Ye-1 + G2ye—2 + wt by xy + b2£t—1 + Et 
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where ys is an AR(2) process that depends on the lags of x+. We deduce that: 


Ay, = (a1 —1) ye-1 + Gaye—a + wt bizt + b2£t—1 + Et 
= (a + a2 —1) y_-1 — a2Ay—1 + u + bi Aa, + (b1 + b2) £t-1 + Et 
—agAyy—1 + w+ bi Az, + 


(a1 + az — 1) (w = ee) 


1— a, — ag 
= —agAy-1+ pt bj Aq, + azi +e 


z) + Et 


where œa = a + ag — 1 and 4% 1 = y-1 + a7! (bı + b2) x-1. We notice that the short- 
run dynamics of y; depends on the long-run equilibrium z;. More generally, if A’(L)y, = 
u + B' (L) x+ c+, Engle and Granger (1987) show that: 


A(L) Ay, = u + B (L) Ax: +a (yt-1 — Bo — Bite-1) + Er 
where a < 0. This model incorporates an error correction mechanism (Salmon, 1982): 


e if z;—1 > 0, then y%_, is greater than its long-run target Bo + b1£t—1, which implies 
that the error correction is negative (az;_1 < 0) and y tends to decrease; 


e if 4—1 <0, then y,_1 is less than its long-run target Bo + 6124-1, which implies that 
the error correction is positive (az,-1 > 0) and y; tends to increase. 


The Engle-Granger model assumes that the variable y is endogenous and the other 
variable x, is exogenous. This is a restricted hypothesis. In the general n-dimensional case, 
we have y ~ I(1) and Ay, ~ I (0). The vector error correction model (VECM) defines 
the short-run dynamics of y, and corresponds to a VAR process that incorporates an error 
correction mechanism?®: 

® (L) Ay = Me + O%t-1 + Et (10.36) 


where z; = B! y is the long-run equilibrium. 


10.2.3.4 Estimation of cointegration relationships 


The first step before estimating cointegration relationships is to verify that all the com- 
ponents of y; are integrated of order one. The second step consists in testing the existence 
of cointegration relationships between the different components of y+. In few cases, the coin- 
tegration vector 3 is known and given by the economic theory. If we reject the assumption 
that the residuals z; = Bim are integrated of order one, we can then conclude that the 
process y+ is cointegrated and £ is a cointegration vector. In other cases, we have to estimate 
the cointegration relationships using the least squares approach or the method of maximum 
likelihood. 


28Suppose that y¢ is a VAR(p) process: 
©! (L) ye = pe + €t 
If yz has a unit root, we have ®/ (L) = ®’ (1) + (1 — L) ® (L) and: 
® (L) Aye = we — © (1) ye-1 + et 


We deduce that az:_1 = —®! (1) yz-1 or aß! = —’ (1). This result is known as the Granger representation 
theorem. 
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Unit root tests The most popular test is the augmented Dickey-Fuller test (ADF) pro- 
posed by Dickey and Fuller (1979, 1981). It consists in estimating the following regression 
model: 


p 
Ay = i + Oyi- + YO OkAYik + Er (10.37) 
k=1 
where £e is a white noise process. The original Dickey-Fuller test (DF) corresponds to the 
case p = 0 whereas the ADF test includes an autoregressive component (p > 1). Generally, 
we consider three specifications of Equation (10.37): 


1. the linear regression does not include the term cz; 
2. c is a constant: Ct = H; 
3. c is a constant plus a deterministic trend: cg = u + At. 


The hypotheses are Ho : 6 = 0 (ye ~ I (1)) and Hi: 6 < 0 (y+ ~ I(0)). It is important 
to test the three models and to specify the null and alternative hypotheses appropriately. 
For instance, the third specification means that the null hypothesis is y, ~ I (1) and the 
alternative hypothesis is y ~ I (0) with a deterministic trend. The ADF test is a Student’s 
t test: T 

Q 
a (ê) 
Depending on the specification, the critical values are noted T (no ct), T, (ce = p) and 
Tr (Ce = u+ At). In Table 10.9, we report the Monte Carlo critical values obtained by 
David Dickey in his Ph.D. dissertation. We notice that they are different than the values 
obtained for the standard t statistic. Today, critical values are calculated using the response 
surface estimation approach proposed by MacKinnon (1996). This approach is very fast, 
very accurate for small sample size and implemented in most econometric softwares. 


ADF = tg = 


TABLE 10.9: Critical values of the ADF test 
Significance level 10% 5% 1% 


T 1.62 -1.94 -2.56 
Th 2.57 —2.86 —3.43 
Tr 3.13 —3.41 3.06 


Remark 129 The unit root tests above are only valid if e+, is white noise. The purpose of 
the ADF test (p > 1) is then to correct the DF test (p = 0) when the residuals are correlated 
and heteroscedastic. An alternative to the ADF test is to consider the Phillips-Perron test 
(PP), which is based on the linear regression associated to the Dickey-Fuller test: 


Ayı = ce + Pyt—1 + Et (10.38) 


Phillips and Perron (1988) propose a modified statistic Z, of the Student’s t statistic that 
takes into account the Newey- West long-run standard error of e+. 


While ADF and PP unit root tests consider that the null hypothesis is y, ~ T (1), the 
KPSS test supposes that the null hypothesis is y, ~ I (0). This stationary test is based on 
the state space model: 

{ Yt = Ce + Ht + Et 
Ht = Ht- + Wt 


662 Handbook of Financial Risk Management 


2 = 0. The KPSS 


where w ~ N (0, o2). In this case, the stationary hypothesis is Ho : o% 
test is given by: 


T 2 
S 
= T2 ia t 
KPSS 2 (1h a 


where S; = se êt, 87 (l) is the Newey-West long-run variance of e; with lag number l 
and ĉê is the residual y, — ¢;. As for the ADF test, Kwiatkowski et al. (1992) propose two 
tests n, and 7, depending on the specification of c+. The critical values of these tests are 


reported in Table 10.10. 


TABLE 10.10: Critical values of the KPSS test 


Significance level 10% 5% 1% 
Ny 0.347 0.463 0.739 
Nr 0.119 0.146 0.216 


Least squares estimation If the n-dimensional process y; is cointegrated, there exists 
a non-zero vector 6 = (61, ..., Bn) such that 8! y is stationary. Without loss of generality, 
we can assume that 6, is not equal to zero, implying that 8/8ı is also a cointegration 
vector. Therefore, we can estimate the cointegration relationship with the following linear 
regression: 

Yi = Ce + Boyat F + BnYnat + Ue (10.39) 


and verifying the integration order of the residuals. If u, ~ I(1), then y is not coin- 
tegrated. In the other cases, up ~ I(0) and the normalized cointegration vector is 


B= (i, paan -Ên ): We can then estimate the associated ECM: 


a) (L) Ayı, = Ut + 5 ij) (L) Ayj,t + Qt + Et 


j=2 


where 2; = ÊT yp. As said previously, this two-step approach (also called the Engle-Granger 
method) has two main limitations. First, it implicitly assumes that y1, is endogenous and 
(Y2,t;-+-,Yn,t) are exogenous. Second, it is not valid if there are multiple cointegration 
relationships. 


Maximum likelihood estimation If y; is a VAR (p) process: 


p 
Ye = ht + ye PiYt—i + Et 
i=1 
the associated VECM is: 
p-1 
Aye = p + Iyi +Y O: Aya tet (10.40) 


i=1 


where e} ~ N (0,5), u is the vector component containing deterministic terms? (constant 
and/or trends) and: 
TI =a gi (10.41) 


(nxn) (nxr)(rxn) 


29We have ut = uo or pt = po + pi -t where uo and p are n x 1 vectors. 
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Since 0 < r = rank II < n, yis I (1) with r linearly independent cointegrating relationships 


and n—r common stochastic trends. Given a sample {y1,..., yr}, the log-likelihood function 
is: re : 2 
£(8) = ——> nda — 5 In|] — 5 3 (10.42) 


A ALA 


The ML estimate 6 = (jo, fia, Ê, ring Op a5 25; fi) is obtained by maximizing the objective 
function (10.42) under the constraints (10.41). 

Johansen (1988, 1991) proposes two tests to determine the rank of II. These tests are 
based on the eigenvalues MY > de > > Ne of IT. Let Ho be the null hypothesis that there 
are r cointegration relationships (rank II = r) and Hı be the alternative hypothesis that 


there are more than r cointegration relationships (rank II > r). The trace test of Johansen 
is defined by the likelihood ratio test: 


LRerace (r) 


II 


—2InA 
SUP, £ (0) 
g sup £ (0) 


-T x In (1- âi) 


i=r+1 


II 


l 


where A; is the it eigenvalue of Îi. The underlying idea is the following: if rank I = r, 
then Aerio weds În should be close to zero and the trace test has a small value. In contrast, 
if rank II > r, the likelihood ratio should be large. Like the ADF tests, the likelihood ratio 
has not a standard chi-squared distribution and critical values must be calculated by Monte 
Carlo simulations. If we would like to test the existence of r versus r + 1 cointegration 


relationships, Johansen considers the maximum eigenvalue test: 
LRmax (r) = —2mnA 

SUPHo £ (0) 

supy, £ (0) 


A 


= -Th (1 5 An) 


Again, Johansen (1988) provides critical values calculated by Monte Carlo simulations. 


= 21n 


Remark 130 There are different ways to estimate the eigenvalues Me SS eS An of II 
(Johansen, 1988, 1991). It is interesting to notice that they are equal to the square of partial 
correlations fz,..., p2 between Ay, and y-1 conditionally to Ay; fori=1,...,p—1 such 
that pj > p23 >--- > p2. Indeed, if we consider the two linear regressions: 
p-1 
Ay: = ao + 5 A;Ay-i + €0,t 
i=1 
and: 
p-1 
Yr—-1 = bo + 5 BiAyi—i + E14 
i=1 
we deduce from the Frish-Waugh theorem that Equation (10.40) can be written as: 


E0, = Wer. + Et 


DIAS a & SIST a iT a 
Let So,0 =T 1 paar £060.00 So,1 =T t Savy êr êl 4 and Sia F 1 Jona e126 14 be the 
sample covariance matrices of var (€o,t), COV (€0,t; €1,4) and var (€1,4). Johansen (1988) shows 
that the eigenvalues are the solutions of the matrix equation |ASi1 — Sda 90.0 90,1| = 0. 
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10.2.4 GARCH and stochastic volatility models 


One of the main assumptions when estimating the time series model y; = x} 8 + & is 
that the residuals have a constant variance. However, this hypothesis is not verified when 
we consider economic and financial time series. For example, Figure 10.15 represents the 
monthly returns R; of the S&P 500 index between 1950 and 2017, and the square R? of 
these returns. It is obvious that the volatility is not constant and homogenous during this 
period. 


Ri 


1950 60 70 80 90 2000 10 20 


O AAAA void A Pua and Vend ta TRAADIST NA SALPAA a aAa A a AN A Wan „Ati i 
1950 60 70 80 90 2000 10 20 


FIGURE 10.15: Monthly returns of the S&P 500 index (in %) 


10.2.4.1 GARCH models 


Definition We consider the linear regression model: 
y= 21 B+e 


where € = oye, and ez is a centered standardized random variable M (0,1), which is inde- 
pendent from the past values ¢,_;. Moreover, we have: 


oF = ao + ayer, tare? ot: + get (10.43) 


where a; > 0 for all 1 > 0. We notice that the process €; is not autocorrelated and the 
conditional variance var;—1(€:) = E |e? | F:-1] is equal to o7. Therefore, this variance 
depends on the square of the past values ¢,_;, and is time-varying: an important shock 
will increase the current conditional variance, and the probability to have high magnitude 
shocks in the future. This process reproduces the stylized facts that are often observed in 
financial time series, and has been introduced by Engle (1982) under the name autoregressive 
conditional heteroscedasticity (or ARCH) model. 
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A natural extension of the ARCH(q) model is to consider that the conditional variance 
also depends on its past values: 


oF = aoto? +y at +H Neg eas + 
aeti Hazet a +++ + gE g (10.44) 
where the polynomial (1 — y1 L —---— YpL”) has its roots outside the unit circle. To ensure 


that a? > 0, wa can impose that a; > 0 and q; > 0. This model has been first formulated 
by Bollerslev (1986), and is known as a GARCH(p,q) process. The process y; is stationary 


if the following condition holds: 
q p 
att ahs 


We can then show that the unconditional mean E [e+] is equal to zero, whereas the uncon- 
ditional variance has the expression: 


4 f 2]. ao 
el ae 


If we set m = £? — 07, we get: 


m Pp 
ef = ao + 5 (ai + 7i) ei + h — 5 Yit—i (10.45) 
i=1 i=l 


where m = max (p,q), yi = 0 for i > p and a; = 0 for i > q. Since we have: 
[ne | Fe-1] = E [ef — of | Fs] = 0 


and for s Ft: 


a [nsen] = i [(e - o?) (e? — o?)] 


we deduce that a GARCH (p,q) process for £, is equivalent to an ARMA(m,q) process for 
e?, where 7 is the innovation. Using the formulation (10.45), we retrieve the formula of the 
unconditional variance since we have: 


m p 


i [e?] = ao + $ (a; +) E [e2] +E [n] — XO VE me] 


i=l i=l 


and E [e?_,] = E [e?]. 

If yn (ait y) = OE, i + Ji yi = 1, then the process €? has a unit root and 
we obtain an integrated GARCH (or IGARCH) process (Engle and Bollerslev, 1986). For 
instance, the IGARCH(1,1) process is equal to: 


2 2 2 
o = At V1 + Et 


ao + (1 + a4 (e? — 1)) o? 


If we assume that ap = 0, we obtain: 
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and: 
t-1 


Ino? =Inog +> In (+a (e? —1)) 
i=0 
We deduce that if E[m(1+a;(e?—1))] > 0, then ø? tends to +oo. If 
i (In (1 + ay (e? — 1))] < 0, a? goes to 0. While shocks are persistent in unit root pro- 
cesses (e.g. the random walk), we see here that a shock in an IGARCH process may be 
persistent or not. Therefore, the concept of volatility persistence is not obvious and de- 
pends on the parameter values. Another important property of GARCH models is that 
they are heavy-tailed processes, meaning that the kurtosis (and even statistical moments) 
are greater than that of the normal distribution. 


Remark 131 GARCH models have been extended to multivariate processes, non-normal 
distributions (EGARCH), etc. In practice, these models are not very useful, because they 
are not tractable and difficult to calibrate. Only ARCH(p) and GARCH(p,q) models are used 
by professionals with low orders of p and q. 


Estimation If we make the approximation y, ~ N (x/ 8,07), the log-likelihood of the t*™ 
observation is: i i r 
E 
£ (0) = -3m (27) — 5 no -2 
where ¢, = y — 2; B is the residual. The vector of parameters is 0 = (8,a,7) where 
a = (Q9,Q1,-..,Qq) and y = (71,---,Yp). We then define the estimator by maximizing the 


log-likelihood function: 
T 


ÎomL = arg max > L (8) 
t=1 
This estimator is called the quasi-maximum likelihood estimator, because we have assumed 
that ce; is Gaussian. Of course, this is not true because o? depends on past values of €+. 
However, we can show that this approach is consistent and defines a ‘good’ estimator. 


Application to the S&P 500 index We consider the monthly returns R; of the S&P 500 
index that have been plotted on page 664. To understand the dependence structure, we have 
reported in Figure 10.16 the autocorrelation function? (ACF), the partial autocorrelation 
function”! (PACF) and the 95% significance test®? of R; and R?. We deduce that the 
hypothesis that R, is not autocorrelated is accepted, but the hypothesis that R? is not 
autocorrelated is rejected. This suggests that R, is an heteroscedastic process. Since the 


30Let yz be a discrete-time series process. We reiterate that the autocorrelation function for lag k is 


k 
defined by py (k) = 24 = 
Yy 

31 Partial autocorrelation is the autocorrelation between y; and y;_, after removing any linear dependence 
On Yt—1,--+; Yt—k41- It is denoted by ¢y (k, k) and corresponds to the coefficient $;,, of the linear regression: 


where yy (k) = cov (yt, Yt-k)- 


k 
m= g= X bei (ye-i—T) + ue 
i=l 


32The standard deviation of py (k) and ¢y (k, k) is approximatively equal to 1//T where T is the number 
of observations in the sample. 
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Ljung-Box test”? is not rejected for large lag values — Qpgz (100) = 126.99 and the p-value 
is 3.5%, we can assume that R, is a GARCH process: 


Ri =cette 
c+ ~ GARCH (p,q) 


Using p = q = 1, the quasi-maximum likelihood estimation gives: 


Ri = 69 -1074 + & 
a? = 8.3 - 1075 + 0.838 - o2; + 0.120 - e2; 


All the parameters are statistically significant at the 99.9% confidence level. The annualized 
value of 0? is given in Figure 10.17. We notice that it varies between 9% and 33.3%, whereas 
the mean is equal to 14.15%. This value is close to the long-run volatility calculated with the 
full period, which is equal to 14.30%. In Figure 10.17, we have also reported the standardized 
residuals e; = ø; '€;. The ACF and PACF values of e? show that e? is not autocorrelated, 
and validate the choice of the GARCH(1,1) model. 


Remark 132 We have not discussed the choice of the lags p and q of the GARCH model. 
However, the estimation of higher order GARCH models does not improve the results of 
the GARCH(1,1) model. For example, if we consider a GARCH(2,2) model, we obtain the 
following results: 


Ry = 66- 10-4 + Et 
o? =8.5-10-5 + 0.812-o2_, + 0.073 - €2_, + 0.077 - €2_, 


The estimate 42 is equal to 0, and the p-values of âı and âz are equal to 16.3% and 3.4%. 
Therefore, this model is less convincing than the GARCH(1,1) model, where all the param- 
eters are statistically significant at the 99.9% confidence level. 


10.2.4.2 Stochastic volatility models 


The Kalman filter approach A stochastic volatility model can be viewed as a GARCH 

model, where an innovation process is introduced in the equation of the conditional variance 
2 

Of: 


y(L) of =a (L) er +m (10.46) 


where £+ is a process with zero mean and unit variance, and m is the innovation process 
with E [jm] = 0 and E [n2] = o2. The parameter a, is known as the volatility of the volatility 
or vovol®*. We have q (L) = 1- yL —-:--— pL? and a (L) = ao +a1L +--+ + aql’. In 
order to ensure the positivity of the conditional variance, we may prefer to use an EGARCH 
parametrization h; = Ino?: 


y=xi Bre 


€t = exp (Shi) -E 
hi = ao + Yi Yiht—i + D QiEt—i + | 


where: 


33The Ljung-Box test is a statistical test of randomness based on a number s of lags: 


k=1 


Under the null hypothesis that the data are independently distributed, we have Qy (s) ~ x? (s). 
34See page 570 for its definition. 
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FIGURE 10.17: Diagnostic checking of the GARCH(1,1) model 
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The processes €; and m are not necessarily normal, and we can use heavy-tail probability 
distributions. However, the estimation of the model by the method of maximum likelihood 
is complex because the log-likelihood is a mixture of conditional log-variances h;. 


If we consider the canonical stochastic volatility model (p = 1 and q = 0), we have: 


Ye = exp (Sh) ` € ia 
Pa T Muar) 


Since h; is an AR(1) process, we deduce that: 


ao 
rih] = 
[he] = = ra 
and: 3 
o 
var (h) = —— 


Harvey et al. (1994) propose to define the measurement variable as In y? instead of y;. We 
obtain the state space model representation: 


In 9/2 
ln yf = c+ hit & 
10.48 
{ hi = ao + Yiht-1 + ( ) 


where h; is the state variable, c = E |In e?] and?” &, = Ine? — E [In €?]. If we approximate 
€, as a Gaussian random variable, we can estimate h; by using the Kalman filter with the 


o2 
initialization ho ~ N | <, — J. 
l=- 1-7 
An alternative model of Process (10.47) is: 


Ye = Exp (She) Et 
hi = ao + 71 (hi—1 — a0) + 1 


or equivalently: 


ye = exp (a0) -exp ($h) Et 10.4 
{ he = Viht-1 + nt we 


We deduce the following state space model representation: 


{ Inyg=cthit+& 


10.50 
he = Yihe-1 + 1 ( ) 


where c = ag + E |lne?] and & = ne? — E |lne?]. Again, we can estimate h, by using the 


2 
Kalman filter with the initialization ho ~ M (o — : 
V1 
Using the monthly returns of the S&P 500 index, we estimate the model (10.50) by 
maximum likelihood: i 
{ In (Ri — R)“ = —7.86 + hi + & 
ht = 0.93 - hi—ı +h 


where o,, = 21%. All the coefficients are significant at the 99% confidence level. In Figure 
10.18, we report the annualized volatility estimated by Kalman filter and smoother. We can 
compare these values with those obtained with the GARCH(1,1) model and the 12-month 
historical volatilities. We notice that the stochastic volatility model produces “less noisy” 
volatilities, because large shocks may be due to an increase of h+, but also to a shock on ez. 


35Since e; ~ N (0,1), Abramowitz and Stegun (1970) showed that Ine? has a log-chi-squared distribution 
with mean (1) — ln 2 and variance 12/2, where y(x) is the digamma function. 
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FIGURE 10.18: Estimation of the stochastic volatility model 


Remark 133 The canonical stochastic volatility model has been extended in many direc- 
tions, such as asymmetric or fat-tailed distributions (Harvey and Shephard, 1996; Broto 
and Ruiz, 2004). 


The MCMC approach Kim et al. (1998) suggest to use Markov Chain Monte Carlo 
(MCMC) approach for estimating the previous model. This method is more flexible than 
the Kalman filter, but it also more time-consuming and complex. In Figure 10.19, we report 
the estimates of the stochastic volatility for 4 algorithms: griddy Gibbs sampler, Random 
Walk Metropolis algorithm, Metropolis-Hastings method, and Metropolis-Hastings algo- 
rithm within griddy Gibbs. We notice that these MCMC estimates are close to the KF 
estimates. 


10.2.5 Spectral analysis 


Until now, we have analyzed stochastic processes in the time domain. In this section, we 
focus on the frequency domain or the spectral analysis. We do not present the time-frequency 
(or wavelet) analysis since this approach has not been successful to solve financial problems. 


10.2.5.1 Fourier analysis 


Let (y+, t € Z) be a centered stationary process. The (discrete) Fourier transform of y; 
is equal to: 
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Griddy Gibbs sampler Random Walk Metropolis 
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Metropolis—Hastings MH within Griddy Gibbs 
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FIGURE 10.19: MCMC estimates of the stochastic volatility model 


where à € [0,27]. We define the inverse Fourier transform as follows: 
Y= J y (A) e dA 


The idea of Fourier analysis is to approximate the process y; by a weighted finite sum of 
sine and cosine functions: 


Yt & 5 aj cos (Ajt) + 5 B; sin (A;t) 
j=0 j=0 


where a; and §; are the Fourier coefficients. Under some technical assumptions, we can 
show that®°: 


n— Co 


Y= lim Soa; cos (Ajt) + S> sin (Azt) 
j=0 j=0 


= ‘a cos (At) dA (A) + [sino dB (2) 
0 0 


As noticed by Pollock (1999), the Fourier analysis is made under the assumptions that A (A) 
and B(A) are two independent stochastic processes with zero mean, and non-overlapping 
increments of each process are uncorrelated®”. Moreover, we have: 


var (dA (A)) = var (dB (A)) = 2 - dF; (A) 


36We have the correspondence dA (Aj) = a; and dB (Aj) = bj. 
37This means that E[dA(A)] = E[dB(A)] = 0, E[dA(A)-dB()’)] = 0 for all (A,\’) and 
E [dA (A)-dA(\’)] = E [dB (A) -dB(\’)] = 0 if AAD’. 
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F, (A) is called the spectral distribution function, and its derivative fy (A) is the spectral 
density function. Pollock (1999) shows that: 


u = | e^t dZ (A) + i eTit dZ* (A) 


0 0 
= if et dZ (A) 
where: i 
dZ (A) = dA (A) ai (A) 
and: : 
dZ*(X) = dA (A) + id B (A) 


2 


The decomposition ys = f7, e^ dZ (A) is the spectral representation of y+. We notice that 
the processes Z (A) and Z* (A) are not independent: 


[dZ (A) dZ* (A)] 


svar (dA (A)) 
fy (A) dà 
but we verify that E [dZ (A) dZ* (A)| = 0 if AA. 


10.2.5.2 Definition of the spectral density function 


Let (yz,t € Z) be a centered stationary process. We denote by yy (k) the autocovariance 
function. We can show that there is a function f, (A) such that: 


w= [fy ean (10.51) 


The function fy (A) is called the spectral density of the process y+. We can demonstrate that 
the two following conditions are equivalent: 


1. y has the spectral density fy (A). 


Co 
s=— 00 


2. There is a white noise (£+,t € Z) and a sequence (Ys, s € Z) satisfying X` Y? < œ 


such that: ` 
y= >> Yses (10.52) 


In this case, the spectral density function fy (A) is defined by: 


Co 


fa = = whem (10.53) 


= H+ Sa (k) cos (Ak) 


k=1 


= = Y y (k) cos (Ak) 
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Therefore, the spectral density function contains the same information that the autoco- 
variance function. We can use both fy, (A) or y (k) to characterize a stationary stochastic 
process. The only difference is that the autocovariance function is a representation in the 
time domain whereas the spectral density function is a representation in the frequency do- 
main. This result is not surprising if we refer to the Fourier analysis of time series. Indeed, 
the spectral density function is also the variance of the processes Z (A) and Z* (A). 


Remark 134 We verify that: 


T , T 1 2 ; 3 
fy AJda = / (z 5 mae) arr dA 


=n = h=—oo 
= Vy (h) f eiMk-h) qy 
20 Jz 
h=- 
yy (h ame h) 
h£k 
B Yy (h) eit (k—-h) _ ee 
= wl) On i(k—h) 
h£k 


= y(k)+ 5 Yy (h) ae 


because sin (r (k — h)) = 


10.2.5.3 Frequency domain localization 


The information contained in the autocovariance function is encoded differently in the 
spectral density. Consider the white noise process ¢, ~ N (0,07). We have ye (0) = o? and 
ye (k) = 0 for k # 0. We deduce that: 


fe) = 5-r¢(0) cos (20) 


Qn 
The spectral density of the white noise process is then a constant. It is the worst localized 


signal in the frequency domain. Consider now the process 7; such that f, (Ac) = c and 
fn (A) = 0 for A Æ Ae. It is the best localized signal in the frequency domain. Let us analyze 


the cycle signal: 
. (= ) 
yt = 2sin | —t 
P 


In Figures 10.20 and 10.21, we represent this cycle for different periods p and the corre- 
sponding autocorrelation function py (k) = yy (k) /Y%y (0). We calculate the spectral density 


as follows: 
10000 


BAS Yy (k) cos (Ak) 


In Figure 10.22, we notice that the function f, (A) can be approximated by fy (A) where 
the frequency Ae is equal to 27/p. This is then the inverse of the period p (normalized by 
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FIGURE 10.20: Time representation of the process £+ 
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FIGURE 10.21: Autocorrelation representation of the process x 
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27). We recall that 1/p is also the number of cycle per time unit. Suppose that we calculate 
the spectral density of an economic cycle using monthly data, for example the business 
cycle that has a 7-year period. Then the spectral density must have a high value c at the 
frequency Ae: 
he = on = 0.0748 
C 7x12 ` 

The localization of low frequency phenomena will be at A close to 0 whereas the localization 
of high frequency phenomena will be at higher frequencies A (close to r). A is also called 
the harmonic frequency. Low harmonic frequencies corresponds to long-term components 
while high harmonic frequencies are more focused on short-term components. 
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FIGURE 10.22: Spectral representation of the process x; 


10.2.5.4 Main properties 


Independent processes Let x, and y; be two centered and independent stationary pro- 
cesses. If we consider the process z; = £t + yz, we have: 


Ya+y (k) = a [zizt k] 
= Ef|rixi-r] + E [yry] + E [zty] +E [ytst] 
Ya (k) + Yy (k) 


It follows that: 


ROJ = SO a(k) cos (Ak) 
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The spectral density function of the sum of independent processes is then the sum of their 
spectral density functions: 


fe+y (A) E fr (A) T fy (A) 


Linear filtering Equation (10.51) shows that the autocovariance function yy (k) is the 
inverse Fourier transform of the spectral density function f, (A), whereas Equation (10.53) 
means that the spectral density function fy (A) is the Fourier transform of the autocovariance 
function yy (k). However, we may wonder what the implication of Equation (10.52) is. We 
recognize the Wold decomposition y; = Y (L) e, where 4 (L) = 3°. WsL*. It follows 
that the autocovariance function of y+ is equal to: 


yy (k) = Elyeye—-r] 


= a(S) (Even) 


= So YDS erdere (8 + k= 1) 


r=— oo S=—oco 


We deduce that the spectral density function is: 


LO = e 


pym ( 5 5 Ur se (s +k—- n) e AR 
We introduce the index h = s + k — r: 


TOR =DD (2 2 vben) py 


h=- r=— o0 s=—c 
= YE per Y pe (z Sx mer) 


II 


p (e>) w (e>) fe (A) 
yp (e7) (EP) e (A) 
= ENFERN 


2 à 
= z PED 


II 


This result is important because we obtain an analytical expression of fy (A) that does not 
need to use the Fourier transform of the autocovariance function yy (k). Moreover, we can 
generalize this calculus based on the Wold decomposition to any linear filter ọ (L): 


ye = p(L) a 
where x; is a stationary centered process. By using the same approach, we obtain: 


fy A) = |e (e~)|? fe Q) 
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Therefore, Fourier analysis transforms convolutions into multiplications. We can write the 
complex transfer function z = y(e~) in the polar form y(e~) = r(A) e where 
r (A) = |z| and 6 (A) = atan2 (Im (z) , Re(z)). We deduce that: 


m = f az) 


=g 


= i, r (A) OTOA) AZ, (A) 
r(A) and @(A) are called the power-shift and phase-shift of the filter since they impact 
respectively the amplitude and the period of the cyclical components. 


Spectral density of some useful processes Let us first consider the AR(1) process 
Ye = OYe—-1 + Et Where c ~ N (0, o°). We recall that the autocovariance function is equal 
to: 


oll 
k) = ——~ 
Vy 1— o? 
The Fourier transform of yy (k) gives: 
1 Š —iàk 
fyQ) = a Yy (k) e 
1 g? — |k| ,—iAk 
~ o7\ 7 o? 2 gre 
k=—0o 
= i ( a? ) (: + yoke ms Sate] 
_— d2 
ey k=1 k=1 
1 2 ià —ià 
ex. o Fa ge Je ge . 
27 \1— ¢? 1— de’  1-— de~ 


D l o? 1 — dede~ 
On (; -8 E (1 =) 
2 


2r (1 — 2¢ cos r + ¢?) 


To calculate the spectral density function, we can also use the result obtained for linear 
filters. We have: 


fy (0) =|6 (eC) fe) 
where ¢(L) = 1 — L. We deduce that: 


a2 -iNi 
PEI = gea] 
= |1—¢4(cosd— isin A)|? 
= (1— ¢ġcosà)? + ¢? sin? à 
1 — 2ġ cos À + ¢? 
We obtain the same expression of the spectral density function: 


o? 1 


fy) = 2r (1 — 2¢cosr + ¢?) 
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More generally, the spectral density function of the ARMA model © (L) y: = O (L) c is 
given by: 
_@)e(e*)/' 
— 2 |B (e) 
In the case of the MA(1) process yg = € — 0€4-1, we have: 


fy A) 


2 
=f fja ' 2 
fy) = z (1 — 20 cos \ + 6”) 


For an ARMA(1,1) process yt = @y:-1 + €t — O€t-1, we combine the AR(1) and MA(1) 
filters and we obtain: 
a a? (1 — 20 cos À + 0?) 

fy (A) = 2r (1 — 2¢ġ cos À + ¢?) 
We now consider the process 2; = £4 + yz, where 2; = Q1£t—1 + Ut is an AR(1) process and 
Yt = vt — 014-1 is an MA(1) process that is independent from x. We have: 


FA) = fel) + fy) 
o 1 o? : 
Qn (1 — 26, cos À + $7) t Sz (1 — 20; cos A + 67) 


In Figure 10.23, we represent the spectral density function of different process*®: 


e a white noise process y = €z; 

e an AR(p) process yx = Jf] biye—i + Et; 

e a MA(q) process yy = €t — D044 951-43 

e an ARMA(p,q) process yz = 7? , QiYt-i + Et — Dzi Ojet—j; ARMA #1 corresponds 
to the set of parameters ġı = 0.75, @2 = —0.5, 0; = 0.75, 02 = —0.5 and 03 = 0.25 


whereas ARMA #2 corresponds to ¢, = 0.5, d2 = 0.15, 6, = 0.75, 02 = —0.1 and 
63 =0.15. 


We notice that some processes are well-located in the frequency domain, meaning that they 
are more ‘cyclical’. On the contrary, MA processes are more ‘flat’. 


We now introduce the notion of stationary form, which is an important concept in 
spectral analysis. Consider the following model: 


ZS = Tety 
Le = ey + Ue — Oru 
Yt = Ut 


where uw ~ N (0,02) and v ~ N (0,02). It is obvious that z; is not stationary, implying 
that there is no spectral density function associated to the process z4. However, we notice 
that (1 — L) z is stationary because we have: 


Zt — Zt-1 = (1 — 8L) ue + (1 -— L) v 


S (2) = 2% — z1 is called the ‘stationary form’ of z; and we have: 


o2 ee 
fis(z) (A) = (1 — 261 cos A + 67) T- +2 (1 — cos À) a 


38The standard deviation o of the noise £¢ is set to 20%. 
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FIGURE 10.23: Spectral density function of ARMA processes 


In the general case, if x; has a stationary form, there is a lag polynomial ọ (L) such that 

On page 652, we have already studied structural time series models (STSM) with un- 
observable components (Harvey, 1990). They generally have a state space representation. 
Since they are non stationary, we have to find the corresponding stationary form. Let us 
consider the ‘local level’ model (LL): 


Ye = be + Et 
Me = Pe-1 + | 


where e; ~ N (0,02) and m ~N (0, a2). The stationary form of y is: 


S(y:) = Y-Y- 
= m+ (1-— L)e 


We deduce that: 
a? +2 (1 — cos à) o2 


A = 
fsi) ( ) on 
The ‘local linear trend’ model (LLT) is given by: 
Ye = Ht + Et 
Ht = Me-1 + Br-1 +m 
bt = Britt 


where es ~ N (0,02), m ~ N (0,07) and Ġ& ~ N (0,02). The stationary form of y is: 


Sly) = -LP y 
= G-it+(1-L)m+(1-L)e 
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whereas the spectral density function is: 


_ oF + 2(1—cosr) on +4(1 — cos A)? o2 
fs y) (A) = On 


The ‘basic structural model’ (BSM) has the following expression: 


Ye = Me + Vt + Et 
Ut = Me—-1 + bt-1 + | 


Be = Bri tG 
s—1 

SE Vt-i = Wt 
i=0 


where e, ~ N (0,02), m ~ N (0,0 ays Ow N (0,02) and w, ~ N (0,02). The stationary 
form of y is: 


S) = (L-L)(—L*) y 


(1-L)1-LD%)e+(1-L*%)mt+ (£r) Git 


i=l 


II 


(1-2L+ L?) w 


It follows that the spectral density function is equal to: 
o? 
fsa (à) = ga-pa-zs) (A) z- Tga- Ls 3) 5 
2 
IÐ? L°) (A) a + ga-21+12) (A) 


Fl 


+ 


a 7 


N 
yI 


where: 


ga-zs) (A) 2 (1 — cos sÀ) 
ga-2L+12) (A) = 6—8cos z + 2 cos 2A 
(A) 


NSL rs) (À = +250 T ) cos jÀ 
ga—L)a—zs) (A) = 4(1-— cos A) (1 — cos 4) 


We now consider a variant of the basic structural model: 


Yt = Ht + Be + ve + et 


where us is the long-run component, 6; is the mean-reverting component and 7; is the 
seasonal component: 

lt = Mt-1 + 

bt = OFt-1+G 

SS Vt-i = Wt 
where 7, ¢, and w+ are independent white noise processes with variance Ge o? and o2. 
L+ is then a random walk, 6; is an AR(1) process and % is a stochastic seasonal process 
because we have yt = Yt-s + wy — w1. If a? = 0, the seasonal component is deterministic 
(Jt = Yt—s). As for the basic structural model, the stationary form is: 


S(y%) = (-L)A-L’)y 
(=L) A= L°) e+ (1— L°) m 


1— L- L+ L+ 
( E Jet (2+ 2)w 
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We deduce that: 


2 2 
fsa (A) = Ga-nya-z4) A): = + gaz) (A): x + 
o? e 
aa) (A) = + 9a—21+412) (A)- oe 
where: i 
—L—Ls+Lsti 
9 (ana Lstust1) ees ao 
and??: 


ga-L-Ls+L:+1) (A) = 4 — 4cos À — 4c0s sÀ + 2 cos (s — 1) A+ 2cos(s+1)A 
The ‘cycle model’ (CM) is defined by the following state space model: 
ye = pi 
Yri _ cos àe  sinAg Ut-1 4 Kt 
yx J P\ sind. cos r¢ wry Ke 


where x; ~ N (0,07) and Kf ~ N (0,07). Harvey (1990) showed that: 


1 — pcos AeL psin àcL a 
= K | € 
á 1=2pcosà\ L+ pL?) 1 — 2p cos AeL + p2L?) * 


and: 


1+ p? — 2p cos Àe cos À ) o? 


À = 
fy) SE 2m 


We have represented the spectral density function of the previous structural time series 
models in Figures 10.24 and 10.25. The set of parameters are the following: 


e local level model: øs = 0.20, a, = 0.10 for Model #1, o, = 0.20 for Model #2 and 
Oy = 0.30 for Model #3; 


e local linear trend model: os = 0.20, o¢ = 0.10, oy = 0.10 for Model #4, on = 0.20 for 
Model #5 and o, = 0.30 for Model #6; 


e basic structural model: øs = 0.10, oy = 0.10, o¢ = 0.10, ocu = 0.10, s = 4 for Model 
#7 and s = 12 for Model #9; for Model #8, we have o. = 0.20, oy = 0.30, o¢ = 0.10, 
Ou = 0.10 and s = 4 whereas for Model #10 we have o, = 0.10, oy = 0.10, o¢ = 0.10, 
Ow = 0.20 and s = 12; 


e cycle model: ø = 0.10. 


In the case of the cycle model, we verify that we obtain the spectral density function of a 
pure deterministic cycle with \* = A. when p > 1. When p is small, the process is not well 
localized in the frequency domain (see Figure 10.25). 


39We recall that 9—oL) (A) = 1 — 2¢ cos À + g2. 
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FIGURE 10.24: Spectral density function of LL, LLT and BSM 
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FIGURE 10.25: Spectral density function of the stochastic cycle model 
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10.2.5.5 Statistical estimation in the frequency domain 


The periodogram The periodogram is an estimate of the spectral density function. For 
that, we define the discrete Fourier transform (DFT) of the time series {y4,t = 1,...,n} as 


follows? 
m 
= ` pe ae 
t=1 


where A; = 27 (j — 1) /n and j € {1,...,n}. The periodogram Ty (A,;) for the frequency A; 
is then equal to: 


[dy (Aj 


fy (Aj) = Oo ee 


(10.54) 


Under some conditions*!, we can show that: 


lim E [Iy (A)] = fy (A) 


n— o0 


It follows that T, (A) is a natural estimator fu (A) of the spectral density function*?. Indeed, 
we have: 


dy (Aj) 


n 
ye we T t 
t=1 
n n 
5 yt cos (Ajt) — iX yz sin (Azt) 
t=1 t=1 


and: 


= 5 5y YsYt cos (Ajs) cos (Ajt) + 


=1 t= 
5 Ys¥t sin (Ajs) sin (A;t) 
1t 


2 


Since we have cos (a — b) = cosa cos b + sin a sin b, it follows that: 


Idy (à;)|? = (ee (A; ») + (Senn (A; ») 


=1 


n 


n 
X usin cos (À; (t — s) ) 


s=1t=1 


40We define the Fourier transform for some particular frequencies A; because we generally use the FFT 
algorithm to compute it (see Remark 135 on page 684). However, we can also define the Fourier transform 
for all à € [0, 27]. 

41In particular, we reiterate that the process must be stationary and centered. 

42In many textbooks, the normalization factor 27 in the periodogram formula is omitted implying that 
fy (A) = (Qn)! Iy (A). We prefer to adopt the convention to include the scaling factor. 
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We recall that the empirical covariance function is 4, (k) = n~t cy, Yt-ky:. Finally, we 
obtain: 


n n 


hoy) = = >> yeye cos (Aj (t — 8)) 


s=1 t=1 


n—-1 n 
1 
= m 5 XC yrun cos (Azk) 


k=- (n—1) t=1 
1 n-1 
= 5 îy (k) cos (A;k) 


k=- (n—1) 


This formula is similar to Equation (10.53) where the theoretical autocovariance function 
is replaced by the empirical autocovariance function and the sum of the infinite series is 
replaced by the truncated sum. Therefore, the periodogram of the time series y; contains the 
same information than the empirical autocovariance function. This is why we can retrieve 
it using the inverse discrete Fourier transform (DFT): 


Remark 135 In practice, we use the fast Fourier transform (FFT) and inverse fast Fourier 
transform (IFFT) to compute I (Aj) and ĝ (k). These algorithms assume that the length n 
of the time series is a power of 2 and take the advantage of many symmetries of cosine and 


sine functions*®. 


More generally, the asymptotic probability distribution of the periodogram is a chi- 


squared distribution under some assumptions“: 
Ty (A) 
im 2# ~ x2 
noo fy (A) 
We retrieve the previous result: 
ac it fy) p 
jim Ely (A) = A E [xa] = fy) 


One of the drawbacks of the estimator (10.54) is its high variance: 
2 
lim var (Ly (A)) = fyl 


n= oo 4 


à) 


-var (x2) = fy (A) 


In particular, the variance does not go to zero when n tends to oo. This is why we use in 
practice the smoothed periodogram defined by: 


s=m 


Ty (Aj) = J, wm (8) g0) 


where Wm (s) is a smoothed window function and m is the bandwidth. The function wm (s) 
is equal to W (s/m) where W (u) is a normalized function, which is also called the spectral 
window function (see Table 10.11). 
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FIGURE 10.26: Estimation of the spectral density function 
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FIGURE 10.27: Estimation of the autocorrelation function 
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TABLE 10.11: Spectral window functions 
Name W (u) 
Bartlett 1—|ul 
Parzen (1 — 6 lul? + 6 lul”) A{|ul < 4}+ 
3 
2(1-— |u)" -1 {lul > 5} 
Tukey 1 — 2a + 2a cos (ru) 
Rectangular 1 
Daniell (mu) * sin (ru) 


Priestley 3 (mu sin (ru) — (ru) cos (xu)) 


In Figure 10.26, we consider an AR(1) process y with ¢ = 0.5 and o = 20%. We 
represent its spectral density function f, (A), and then simulate the process with 1000 
observations. We calculate the periodogram, but we notice that I, (A) is noisy. This is why 
we estimate the smoothed periodogram J* (A;) with the Tukey window, whose parameters 
are a = 0.25 and m = 50. We obtain a less noisy estimator. In order to illustrate the impact 
of smoothing, we estimate the autocorrelation function. We first apply the inverse discrete 
Fourier transform to I (Aj) and I* (Aj) in order to obtain the autocovariance functions 
Ay (k) = nt Y5- Ty (Ay) ^E and 93 (k) = n~t OF, TF (Az) eò" and then normalize: 


x ioe ulk) sere a (k) 
ul) = aay oF P= ae cy 


In Figure 10.27, we compare these functions with the theoretical autocorrelation function 
Py (k) = ¢*. We also report the empirical autocovariance function calculated directly by 
the means of convolution. We verify that it is exactly equal to the inverse discrete Fourier 
transform of the periodogram. However, we notice that the function fy (k) does not converge 
to zero when the lag k is large. This is not the case with the smoothed periodogram, that 
is less biased in finite samples. 


The Whittle estimator Whittle (1953) proposes an original method to estimate the 
parameters 0 of stationary Gaussian models in the frequency domain. Let us consider the 
process y, and we note Y = (yi,..-, Yn) the vector of joint observations. Since ys is centered, 
we have Y ~ N (0, £) where X is the Toeplitz covariance matrix. The log-likelihood function 
is then: 


1 1 
£(0) = —~In2n 5in|z|- 5YTEY (10.55) 


2 2 
Gray (2006) shows that the eigendecomposition UAU* of X can be approximated by VAV* 
where V is a circulant matrix. In this case, the eigenvalue A; j is equal to 27 fy (A;), whereas 


the eigenvector V; is related to the Fourier coefficients: 
Vij œ n- 1/2e-iàst 


Since V is an unitary matrix, we have: 


ln |£] ~ In|VAV*| = ln (|V| |A| |V*]) = ln |A| = nln 2r + Xoin fy (Aj) 


j=1 


431f it is not the case, yz is padded with trailing zeros to length 2™ where m is the nearest integer greater 
than or equal to Inn/In2. 

44For example, this result is valid if y; is a Gaussian process or if the autocovariance function decreases 
rapidly. If A = 0, the chi-squared distribution has one degree of freedom (see Exercise 10.3.14 on page 712). 
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and: 


Y'yoly 


K 
< 
= 
> 
< 


where Z = (21,..., 2n) = VY. We notice that: 


n 
X wn Pye it 
t=1 


= 2lyy) 


We deduce that the log-likelihood function can be written as: 


= 5 yt yr LA) 
BG) ania ta a 3 F0) (10.56) 


There is a fundamental difference between time domain maximum likelihood (TDML) and 
frequency domain maximum likelihood (FDML). Indeed, in the time domain, we can define 
the log-likelihood for a given observation date t. In the frequency domain, defining the 
log-likelihood for a given frequency A; does not make sense. 


In practice, we may observe a significant difference between the values given by Equations 
(10.55) and (10.56). Nevertheless, TDML and FDML estimators are generally very close. For 
example, we consider the AR(1) process y = dy:—-1 + € where e ~ N (0, o a), ġ = 0.5 and 
o= 20%. In Figure 10.28, we report the probability density function of the two estimators 
ÊTDML and ÊFDML when the sample is equal to 300. We verify that they are very close. 


10.2.5.6 Extension to multidimensional processes 


The previous results can be generalized to the multivariate case. Let us now consider 
the m-dimensional time series y: = (Yt,1, -- -, Yt,m)- Since ys is centered, the autocovariance 
matrix is defined as T, (k) = E [yy,_;,]. We notice that T, (k) is not necessarily a symmetric 


matrix", but we have T, (k) =T, (—k). The m x m spectral matrix fy (A) is then defined 
by: 


1 oO 


2T 
k=— 00 


f(A) = Ly (k) e7 (10.57) 


It follows that fy (A) is an Hermitian matrix. Moreover, the diagonal elements are real, but 
the off-diagonal elements are complex. Similarly, the multivariate periodogram is equal to: 


27n 


Iy (à;) = 


where Aj; = 27 (j — 1) /n, j = {1,...,n} and dy (Aș) is the multidimensional discrete Fourier 
transform*° of the time series y: 


n 
= X yee T E 
t=1 


“5Because we generally have E [xtyt—p] Æ E [yext—p] for two unidimensional time series xz and yz. 
46dy (Aj) is a vector of dimension m. 
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FIGURE 10.28: PDF of TDML and FDML estimators 


The main properties obtained in the case m = 1 are also valid in the case m > 1. For 
instance, we have: 


1. the spectral matrix of the multidimensional white noise process e4 ~ M (0, X) is equal 
to fy (A) = (27)! X; 


2. Titi seg E [I (A) = fy (A); 
3. limno var (Iy (A)) = fy (A) © fy A); 


4. if x and y are two independent multidimensional stochastic processes, then 


foty (A) = fe (A) + fy A); 
5. if z = Ay, and A is a real matrix, then f; (A) = Afy (A) A‘; 


6. the spectral density function of the linear filter y, = Y (L) x+ is given by: 
fy) = Y (EY) fe YUP)” 


The parameters 0 of a stationary centered Gaussian model can also be estimated using 
the FDML method with the following Whittle log-likelihood function: 


Š lL x 
£(0) x -> Inf, (Ay) — 5 trace (fy ADL )) (10.59) 
j=0 j=1 
These models can generally be written as a stationary state space model: 


Yt = Zat + Et 
Qt = Tatı + Rn 
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where e ~ N (0, H) and m ~ N (0, Q). It follows that: 
-1 
y= Z(I-TL) R + & 
where L is the lag operator and I is the identity matrix, whose dimension is equal to the 


size of the state vector a,;. Using the above properties, we deduce that the spectral density 
function of y is: 


fy A) 


aeren RFR” (a = Te ZT + fe (A) 


Z (I — Te- RQR! (a - TEN Z'+H 
= P (10.60) 


For instance, Roncalli (1996) extensively used the Whittle method to estimate VARMA, 
SSM and complex Gaussian models. 


We now consider the special case m = 2 and we note 2 = (x+, y+) the bivariate process. 
The autocovariance matrix I, (k) becomes: 


Ye (k) Ya y (k) ) 
T; (k) = ! 
Mala 6 
where Yz,y (k) is the autocovariance function between x, and y+. For the spectral matrix 


fz (A), we have: 
(f FA) fey) 


where f, (A) and fy (A) are the spectral density functions of the stochastic processes x, and 
Yt and fry (A) is the cross spectrum: 


jer =e ye 
Similarly, the bivariate periodogram of the process z; takes the following form: 


B Iz (A;) Izy (A j) 
L (Aj) = ( Dyce (A;A) Ty (Ag ) ) 


where I, (Aj) and Ty (A;) are the periodograms of the stochastic processes x; and y+, and 
Iz y (Aj) is the cross periodogram: 


da (Ay) dy (Ay) 


2mm 


Tey (Aj) al 


Let us consider the bivariate process z; = (£+, yz), which has the following SSM form: 
Q1,t 
1 0 0 
010 ) pe le 
ant | 09 0 0 O11 


1 
0.5 —0.1 0 Q2 4—1 +] 0 
0 0 0.3 Q3, t—1 0 
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where es ~ N (0, diag (0.5,0.2)) and m ~ N (0, diag (1.0, 1.0)). We also have: 


{ Tt = Q1, T €1,t 


Yt = 2,4 T €2,t 


and: 
ai, = 0.9 - &i -1 +m, 
az t = 0.5 : a, 4-1 — 0.1- a24-1 + Noe 
3,4 = 0.3 - a3 4-1 — N2,t 


Therefore, the bivariate process (x+, y+) can be viewed as a noisy restricted VAR(1) model 
where the residuals of the second VAR component follow a MA(1) process*’. Below, we give 
the values of f, (A) calculated with Equation (10.60): 


A fe (A) fy) fry (A) fuz (A) 

0 15.995 3.452 7.234 7.234 

0.5 0.770 0.312 0.285 + 0.1402 0.285 — 0.1402 
pi/4 0.376 0.234 0.104 + 0.0912 0.104 — 0.091% 
pi/2 0.168 0.211 0.004 + 0.0442 0.004 — 0.0442 
pi 0.124 0.242 —0.024 —0.024 


We verify that fs (A) and fy (A) are real numbers, fz, (A) and fy x (A) are complex numbers, 
and fy, (A) is the complex conjugate of fry (A). 


In order to interpret the cross spectrum, we can write fry (A) in the complex form: 


Jey (A) = Szy (A) + 4824 (A) 


where cSzy(A) = (2r) EX a yey (k) cos (Ak) is the cospectrum and WS; y (À) = 
(2r) t Se so Yay (k) sin (—Ak) is the quadrature spectrum. The cospectrum is the si- 
multaneous covariance between x; and y; at frequency A, whereas the quadrature spectrum 
is the lagged covariance*® by the phase 7/2. Alternatively, we can write f,,, (A) in the polar 
form: 


Fouy (A) ~ Try (A) ettz (A) 
where fg, y (A) is the gain and 0z y (A) is the phase spectrum (Engle, 1976). We have: 
AP 
A) + ds% (A) 


rey) = |few( 

= CSZ y ( 
It follows that the squared gain function eae (A) is a dependence measure between x+ and 
yt. On the contrary, Ox, y (A) determine the lead-lag relationship between x; and ys. 


We consider the linear filtering model: 


co 
Yy = 5 prlP Ti+ et 


k=—0o 


(L) Lt +H Et 


4T We have N2, t = a3,t — 0.303 4-1. 
48 Because we have: 


ISz,y (A) = (27) 7} 5 Yzy (k) cos E + ak) 


k=—oco 
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where x; and £, are two independent stochastic processes and e+ ~ M (0,02). We have 
co 
Vye (k) = Jo Vs Ix (k — 8) and: 


fya (A) TS tee 5 5 Ps Yx (k = 5) ee 


= 5 oe & 3 Yz (k— s) te) 


fu) = |e le) f+ 


fx (A) 


We have decompose the spectral density function of y; into two terms: the first term fy). (À) 
can be seen as the conditional expectation of fy (A) with respect to fs (A) while the second 
term f(A) is the component due to the noise process. Equation (10.61) is close to the 
Gaussian conditional expectation formula or the linear regression of y on x+. The fraction 
of the variance of fy (A) explained by the linear filter — or the coefficient of determination 
R? — is equal to: 


2 _ Sule A) 
S mp 
lya A? 
FAROA) 
È (A) 


Y, T 


II 


Cy,x (A) is called the coherency function. If the two processes x; and y are uncorrelated, 
then fys (à) = 0 and c, (A) = 0. If os = 0, then fe (A) = 0, |fy,e (A)? = fy (A) fe (A) 
and cy, (A) = 1. We deduce that 0 < R? < 1 and 0 < c3 s (A) < 1. On page 610, we have 
seen that the coefficient of determination R? in the time domain associated to the linear 
regression yt = Bo + Ba; + uz is equal to the square of the cross-correlation Pick between x; 
and y+. By analogy, the coherence function cy, (A) can be viewed as the cross-correlation 
between x; and y in the frequency domain (Engle, 1976). Moreover, we have: 


Fy (A) fe (A) 
Tey (A) etzu (A) 

Fa (A) fe (A) 

ray Aye ee 


yg (A) 
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Since Cy s (A) is a complex function, it is less easy to manipulate than a correlation function. 
This is why Fs, (A) may be preferred to define the cross-correlation between x; and y; in 
the frequency domain. For example, we consider the bivariate process: 


Tt = 0.5 Pay Lt-1 + Et 
Yt ya 0.5 Yt-1 M 


where c+ and m are two uncorrelated white noise processes with same variance. In Figure 
10.29, we represent Cy, (A) in polar coordinates. 


FIGURE 10.29: Coherency function cy. (A) 


10.2.5.7 Some applications 


White noise testing There are many spectral procedures for testing the white noise hy- 
pothesis. For instance, we have shown that the asymptotic distribution of 2 fy (ay? Ty (A) 
is a chi-squared distribution x3. The hypothesis Ho : y ~ N (0,07) implies that 
4no~* I, (A) ~ x3. Therefore, it suffices to estimate the empirical volatility and use goodness 
of fit tests like the Kolmogorov-Smirnov statistic or a QQ plot (Pawitan and O’Sullivan, 
1994) for testing the null hypothesis: 
4T 
Ho : Baty (A) ~ x3 
Another idea is to verify that the periodogram does not contain a value significantly larger 
than the other values. Since the cumulative distribution function of y3 is F (x) = 1—e~* /?, 
we deduce that: i 
T z4 
Pr f S5 O) <a} =F(z)= 1- e77 /? 
Let J} be the maximum periodogram ordinate: 


i = sup {y (Aj): j = 1,...,q} 
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where q = |n/2] is the largest integer less than n/2. We note: 


p a 


G2 Y 


If j Aj’, then I, (Aj) and I, (A,;-) are independent and we have: 
q 
Pele < z} =F (x) = (1 — ere) 


Rejecting the null hypothesis at the confidence level a is equivalent to verify that € is too 
large or F (£)! > a. Therefore, we deduce that Ho is rejected if the following condition is 
satisfied: 

1 


In ———___ 
€>,/In Te 


Remark 136 We can also test the presence of a unit root by considering the null hypothesis 
Ho : S (at) = yt — -1 ~N (0,07). 


Cycle identification Cycle testing can be viewed as the contrary of white noise testing. 
Indeed, if the process y, contains a cycle at frequency Àc, we must observe a peak in the 
periodogram. Fisher (1929) defined the g statistic: 


— Fy Qe) 
= ai Iy (Aj) 


where q = |n/2| is the largest integer less than n/2. Under the hypothesis that the process ys 
contains a cycle at frequency Àe, Fisher showed that the distribution function of g satisfies*®: 


Pr{g > z}=1—- x (-1) (‘) (1 — jx) 


If this probability is lower than the level a, then we accept the presence of a cycle. In 
practice, the frequency Ae is unknown and is estimated by the largest periodogram value: 


Ac = {A: sup I, (Az) } 


We consider the famous example of Canadian lynx data set, which collects the annual 
record of the number of the Canadian lynx ‘trapped’ in the Mackenzie River district of the 
North-West Canada for the period 1821-1934 (Tong, 1990). In the first panel in Figure 
10.30, we represent the time series y, = log x, where xs is the number of lynx. We also 
report the periodogram J, (A) in the second panel and we observe a peak at the frequency 
Ae = 0.6614. The Fisher test is equal to g = 0.59674 whereas the p-value is close to zero. 
We deduce that the cycle period is equal to p = 27/A,. = 9.5 years. In the third and fourth 
panels, we have represented the cycle c and the residuals €; = yt — c. Finally, we obtain 
the following model: 


Ye = ate 
: 2T 
= 2.904 + 0.607 - sin E ‘t= 1.138) +e 


49When the number of observations n is large, Priestley (1981) showed that the probability distribution 
of the statistic g* = 2ng is Pr{g* > z} = ne~*/?. 
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FIGURE 10.30: Detection of the cycle in the Canadian lynx data set 


Long memory time series and fractional processes The fractional white noise 


(FWN) process is defined as: 
(1 = L)! Yt = Et 


where ep ~ N (0,07) and°® |d| < 1/2. Granger and Joyeux (1980) and Hosking (1981) 
showed that y+ has an infinite moving average process: 


y= (1-L) e = 5 OkEt-k 
k=0 


where”!: 
T(k+d) 
NOCES 


We can also write y, as an infinite auto-regressive process: 


co 
5 ỌkYt-k = Et 
k=0 


On = 


where: 
T(k-d) 


fr = T(—d)P(k+1) 


In Figure 10.31, we report the function ¢, for several values of d. If d < 0 (respectively 


50This condition implies that y¢ is stationary. 

511f a > 0, the gamma function I (a) is equal to i t*—le-*t dt. We also have I (0) = œ and (a + 1) = 
aT (a). This last property is used to calculate I (a) when a is negative. For instance, I (—0.5) = —2F (0.5) ~ 
7.0898. 
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FIGURE 10.31: AR representation of the fractional process 


d > 0), then dx is negative (respectively positive) for k > 1. Hosking (1981) showed that°?: 


> Daal (ea) 
T()rd—dl(k—-d+]1) 


Vy (k) =o 


and: 
T(1-—d)Ir (k+ d) 


py (k) = T(@r(k—d+1) 


We verify that py (k) < 0 if d < 0 and p, (k) if d > 0. In the case of the AR(1) process 
yt = bYt-1 + Er, we recall that the autocorrelation function is py (k) = ¢*. When d < 0, 
there is a big difference between FWN and AR(1) processes, because py (k) is an oscillating 
function in the case of the AR(1) process. When d > 0, the tappering pattern is more 
pronounced for the AR(1) than for the FWN process. In order to understand this difference, 
we give here the asymptotic behavior of the coefficients (Baillie, 1996): 


ae = ao" 
pane = PATUD ~ 


52The variance of y; is infinite when the fractional differencing parameter d is equal to 1/2. 
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We deduce that these coefficients decline at a slower rate than for the AR(1) process®?. For 
instant, if d = 1/2, then p, (k) = 1. In Table 10.12, we calculate py (k) for different values of d. 
When we compare these results with the ones obtained for an AR(1) process with parameter 
@, we observe the hyperbolic decay for the FWN process. For instance, for the FWN process 
with d = 0.45, py (10) and py (10000) are respectively equal to 99.15% and 97.79%. For the 
AR(1) process with ¢ = 0.999, we obtain p, (10) = 99% butt p, (10000) = 0%. We also 
obtain the following paradox: the autocorrelation function p, (10000) is higher for the FWN 
process with d = 0.20 than for the AR(1) process with ¢ = 0.999! 


TABLE 10.12: Autocorrelation function py (k) (in %) of FWN and AR(1) processes 


k FWN process AR(1) process 
d/o 0.2 0.45 0.499 | 0.50 0.90 0.999 
0 100.00 100.00 100.00 ; 100.00 100.00 100.00 
1 25.00 81.82 99.60! 50.00 90.00 99.90 
5 9.65 69.90 99.29 3.13 59.05 99.50 
10 6.37 65.22 99.151 0.10 34.87 99.00 
100 1.60 51.81 98.69 0.00 0.00 90.48 
500 0.61 44.11 98.38; 0.00 0.00 60.64 
1000 0.40 41.15 98.24 0.00 0.00 36.77 
5000 0.15 35.04 97.93, 0.00 0.00 0.67 
10000 0.10 32.69 97.79! 0.00 0.00 0.00 
Since y = (1— D= Ez, is a stationary process, we deduce that its spectral density 


function is: 


2 
BA = |a-D “4 F&O 
_4y|-2d O 
= p-a 
o? Ay 


In Figure 10.32, we represent the function fy (A) for different values of d, and we compare 
fy (A) with the spectral density function obtained for the AR(1) process. We notice that 
the high frequency components dominate when d is negative, whereas the spectral density 
function is concentrated at low frequencies when d is positive. In particular, we have f, (0) = 
+oo when d > 0. 


Granger and Joyeux (1980) extended the FWN process to a more larger class of station- 
ary processes called autoregressive fractionally integrated moving average (or ARFIMA) 
models. The ARFIMA(p,d,q) is defined by: 


@(L)(1—-L)*y% =O(L)e 
where e ~ N (0, a”), This model can be seen as an extension of ARMA models by intro- 


ducing a fractional unit root. This is why y+ is said to be fractionally integrated and we 
note y ~ I(d) where |d| < 1/2. The properties of the FWN process can be generalized 


53Fractional white noise processes are also called hyperbolic decay time series, whereas autoregressive 
processes are called geometric decay time series. 
54Tn fact, the non-rounding autocorrelation is equal to py (10000) = 0.0045%. 
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FIGURE 10.32: Spectral density function of the FWN process 


to the ARFIMA process. For instance, the stationary ARFIMA (p,d,q) process possesses 
long memory if d > 0 and short memory if d < 0. The AR(oo) and MA(oo) represen- 
tation are ®* (L) ys = c; and ye = O* (L)e, where ®*(L) = ®(L)(1—L)*O(L)~' and 
©* (L) = ð (L)! (1 — L)? O (L). The coefficients of ®* (L) and @* (L) can be calculated 
numerically using truncated convolutions. Sowell (1992) also provides exact formulations of 
the autocovariance function. Concerning the spectral density function, we obtain: 


2 À —2d © —iry\|2 
fey (ond) “OM 


If we consider the ARFIMA(1,d,1) model defined by (1 — ¢L) (1 — L)? y, = (1 — OL) e1, we 


obtain: P 
2 ~*" (1 — 20 cos A+ 6 

Pos ae ( — ) 

27 2 (1 — 2¢ cos À + ¢?) 


We consider the ARMA processes represented in Figure 10.23 by adding a long memory 


component. Results are given in Figure 10.33. We notice how the function (2 sin "Pa 
impacts the white noise process. When d < 0 (respectively d > 0), the spectral density 
function becomes an increasing (respectively decreasing) function. This function is equal to 
1 for \* = 2arcsin1/2 = 1.047 radians. When d < 0, the short memory part reduces low 
frequency components and magnifies high frequency components. We observe the contrary 
when d > 0. This is coherent with the previous analysis since the ARFIMA process is 
persistent when d > 0 while it is mean-reverting when d < 0. 


There are different approaches for estimating the ARFIMA (p,d,q) model. Sowell (1992) 
derives the exact ML estimator by considering the joint distribution of the sample Y = 
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White noise AR(1) - ọ = 0.5 
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FIGURE 10.33: Spectral density function of the ARFIMA process 


(Y1, ---,Yn). We recall that the log-likelihood function is: 


1 1 
L(0)= -2 m2r z njaj- zY TaY 


2 2 
where X is the covariance matrix of Y: 
Yy (0) Yy (1) Yy (n= 1) 
Yy (1) Yy (0) Yy (n — 2) 
D = 
Vy (n = 1) Vy (n = 2) Yy (0) 


However, this approach may be time-consuming because it requires the inverse of the n x n 
matrix X, where the elements X; j = yy (li — j|) are themselves complicated to calculate’. 
This is why it is better to estimate the vector 0 of parameters by considering the FDML 
approach. The whittle log-likelihood is: 


2mo lS mÀ A )| 
L(0) « 5 Ine oe (asin >) -m Ee 


2 D yest |? |e (e=) i 


2n &— a? (2sin 3)” |O (e-i)? 


Another famous approach is the semiparametric estimation proposed by Geweke and Porter- 
Hudak (1983). When d > 0, we have seen that the low frequency part dominates the 


spectrum: 
g2 , À —2d 
fy (A) & = (2 sin >) 


55 Sowell (1992) shows that they are function of the hypergeometric function. 
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when A — 0. We deduce that: 


In fy (A) > (In a? —In 2r) — dln (asn? >) 


Geweke and Porter-Hudak (1983) estimate the parameters d and o by considering the 
following linear regression: 


N 
ln Iy (Aj) = c — dln (asn? %) + uj 
where?’ A < Amin- 


The fractional white noise process is related to the Hurst exponent H, which can be 
characterized in several ways: 


1. let c > 0 be a scalar; the probability distribution of y; at the date ct is equal to the 
probability distribution of y+ at the time t multiplied by c”: 
D 
Yet ~ ey 
we say that the process y+ is H self-similar or has the self-similarity property; 


2. if we consider the asymptotic distribution of py (k), we verify that: 
py (k) ~ [PY 

3. for the spectral generating function, we obtain’: 
fy) ~ prey 


4. the Hurst exponent is related to the fractional differencing parameter since we have: 


1 
H=d+-= 
dts 


From the previous properties, we notice that H € [0,1]. 


For estimating H, we generally use the R/S (or rescaled range) statistic. Let S, and Ri 
be the standard deviation and the range of the sample {y,,...,y;,}. We have: 


(yj — y) 


k<t 
= 1 


k 
j= 


k 
R= -—y) — mi 
t mar (us y) — min 


where y = n~! $>; yz. Lo (1991) defines the rescaled range statistic called Q; as: 


Ri 


a= = 


We can show that Q, ~ ct” when t > oo. We can then estimate the Hurst exponent by 
performing the linear regression: 


log Qi =a + H logt + ut 


56 Generally, Amin is set to Qr/n. 


D 
57Since we have c~ 7 yet ~ yz, the spectrum satisfies the equation CRUT (et) = fy (A). 
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o (1991) proposes to test the null hypothesis Ho : H = 0.5 by considering the statistic 
V; = Qi/ vt. Under the null hypothesis, the 95% confidence interval is [0.809, 1.862]. 


We consider the daily return of the S&P 500 index and the daily variation of the VIX 
index from January 2007 to December 2017. In Figure 10.34, we report the estimated 
relationship log Q; = â + Hlogt for the two time series. We obtained H = 0.56 for the 
S&P 500 index and A = 0.41 for the VIX index. If we use the statistic V, we do not refuse 
the null hypothesis Ho at the 95% confidence level. However, we see that V; reaches the 
lower bound in the case of the VIX index, whereas it can be higher than the upper bound 
in the case of the S&P 500 index. The S&P 500 could then exhibit long-range dependence, 
whereas the VIX index may be more mean-reverting. However, the whittle estimate d is 
respectively equal to —0.09 and —0.16 and is significant at the 99% confidence level. This 
confirms that the VIX index has short memory, but it contradicts that the S&P 500 index 
has long memory. 


S&P 500 Index VIX Index 


10° 10! 10? 105 “10° 10! 10? 105 


FIGURE 10.34: R/S analysis and estimation of the Hurst exponent 


Signal decomposition On page 693, we do not explain how the cyclical component c; is 
calculated. We reiterate that the Fourier transform of the logarithm of the number of lynx 
yt is given by dy (Aj) = X; we". To recover the signal, we use the inverse Fourier 
transform ys = n~t X}; dy (Aj) es". If we define dẹ (A) as follows: 


cy f d(Ac) if\= Ae or A= 2a — re 
dy (a) = { 0 otherwise 
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Estimating the cyclical component is equivalent to apply the inverse Fourier transform to 
the Fourier coefficient corresponding to the cycle frequency Ae- 


The previous method can be generalized to the partition A = {A1,...,An}: 


A= U Re 
k=1 


where Ag Ax = 0. We define the component y* as the inverse Fourier transform of the 
coefficients dk (A): 


ney f dQ) if CA 
dy A= { 0 otherwise 


It follows that we have decomposed the original signal y, into m signals°®: 


This method allows to extract given frequency components. It is related to Parseval’s the- 
orem, which states that the sum of squares of a time series is equal to the sum of squares 


of its Fourier transform: 
Shel? = ES a, Oo) 
t=1 j=l 


We consider the time series y = C + uz, which is the sum of a long-term cyclical 
component and a residual component. The long-term component is the sum of three cycles 
ck, whose periods are larger than 5 years. The residual component is the sum of a white 
noise process and 5 short-term cycles, whose periods are lower than 1 year. We represent y 
and the components cf, c; and uz in Figure 10.35 for the 2500 dates®®. Then, we consider 
the signal reconstruction based on the m most significant frequencies: 


YW = as (Aj) es" 
j=1 


where dy’ (Aj) is equal to zero for the 2500 frequencies except for the m frequencies with 
the highest values of |d, (A;)|. Figure 10.36 shows the reconstructed signal y?” for different 
values of m. We notice that we may describe the dynamics of yẹ with very few Fourier 
coefficients. Using the Parseval’s theorem, we define the energy ratio as follows: 


2 2 
ER, = Yi lue] = i \On:n| 
m = = 


ae lyel? De Id, (As)? 


where dx.» is the kt" reverse order statistic of dy (Aj). Results in Table 10.13 shows that one 
Fourier frequency explains 35% of the total variance of y,, two Fourier frequencies explain 
70% of the total variance of y+, etc. With 50 Fourier frequencies, that is 2% of all the Fourier 
frequencies, we explain more than 96% of the total variance of y+. 


58This approach is also called subband coding or subband decomposition. 
59We assume that each year is composed of 250 trading days. 
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FIGURE 10.36: Reconstructed signal yp” 
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TABLE 10.13: Value of the energy ratio ER», (in %) 


m 1 2 3 4 5 10 25 50 75 100 2500 
ERm | 34.8 69.5 73.8 78.1 79.9 87.4 94.4 96.5 97.3 97.5 100.0 


Filtering theory Previously, we have implicitly used the concept of filtering in order to 
extract the cycle component. More generally, the filtering technique consists in removing 
some undesired frequencies. From a given signal x+, we build another signal y; such that: 


ye = y (L) x 


xz is also called the input process, whereas y; is the output process. Since y (L) is inde- 
pendent from time t, (L) is a time-invariant filter. We distinguish four families of such 
filters: 


1. low-pass filters reduce components of high frequencies in order to get the dynamics 
due to low frequencies; a famous example is the moving average: 


m-1 


1 
Yt = F 5 Tt-k 
k=0 


2. high-pass filters reduce components of low frequencies in order to get the dynamics 
due to high frequencies; the linear difference y, = £+ — z411 is an example of high-pass 
filters; 


3. band-pass filters combine a low-pass filter with a high-pass filter; they can be used to 
study the dynamics due to medium frequencies; 


4. band-stop filters are the opposite of band-pass filters; therefore, they remove medium 
frequencies. 


We also make the distinction between causal and non-causal filters. In the case of a causal 
filter, y (L) can be written as Soo prL? meaning that y, does not depend on the future 
values of x,. If ọ (L) is linear, it is said linear time-invariant or LTT filter. 


We recall that the spectral density function of yz is: 


fy A) =| (e~)|? fe Q) (10.63) 


The function y (e=) is known as the frequency response or transfer function. Equation 
(10.63) can be seen as the frequency-domain version of the time-domain equality var (Y) = 
var (aX + b) = a? var (X). Let us consider the unit signal: 


f1 ift=0 
Tt =$ 0 otherwise 


We can then calculate the impulse response y, = ọ (L) x+. If yg = 0 for any date t > t*, 
the impulse response is finite, and the filter is known as a finite impulse response (or FIR) 
filter. An example is the moving average filter: 


1 

L? + L! Sia 

g(L) === =) 95153 
2 y =O0 fort >1 
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Otherwise, we have an infinite impulse response (or IIR) filter. For instance, this is the case 
of the AR(1) filter: 


y(L) =(1-4¢L) > y= >0 


The output signal ys is the discrete convolution of the input signal x, and the vector pz 
of the filter coefficients: 
Yt = Pt * Tt 


Using the Fourier transform, we deduce that: 
dy (A) = dọ (A) dz (A) 


We can then obtain the output signal y, by multiplying the discrete Fourier transforms of 
pı and x+, and taking the inverse Fourier transform of the product. For extracting a cycle, 
we use the following transfer function: 


1 ifà= Ae 
dy (à) = { 0 otherwise 


For a low-pass filter, we can consider a transfer function such that: 


_f 1 ifde[0,r*| 
do (2) = { 0 if AE [A*m] 


Of course, we can specify more complicated filters. For example, the Hodrick-Prescott and 
Baxter-King filters that are used for estimating the business cycle are respectively a high- 
pass filter and a band-pass filter. 


Remark 137 Time domain analysis consists in localizing common and residual patterns 
associated to a signal in the time scale. Frequency domain analysis (or spectral analysis) 
does the same job by considering a frequency scale. The wavelet analysis combines the two 
approaches to study the signal in the time-frequency domain. 


10.3 Exercises 


10.3.1 Probability distribution of the t-statistic in the case of the linear 
regression model 


We consider the linear regression model: 
Yi = 2) B+ ui 


where y; is a scalar, x; is a K x 1 vector and u; is a random variable. By considering a 
sample of n observations, the matrix form of the linear regression model is: 


Y=xX6+U 


We assume the standard assumptions: Hi : U ~ N (0,0°In), Hz: X isan x K matrix 
of known values, H3 : rank (X) = K and Hy: limn! (XTX) = Q. We note 6 the OLS 
estimator and H = X (XTX) X! the hat matrix. 
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1. Show that the matrices H and L = In — H are symmetric and idempotent. 

2. Show that LX = 0 and X'L = 0. Deduce that Ô = LU. 

3. Calculate trace (L) and rank (L). 

4. We note RSS (8) the residual sum of squares. Show that 62 = (n — K)~'RSS (3) is 
an unbiased estimator of o°. 


5. By considering the normalized random vector V = (oI) t U, find the probability 
distribution of 6?. 


6. Show that B and U are independent. 


7. Find the probability distribution of t (4,): 


10.3.2 Linear regression without a constant 


We consider the linear model y; = x 6 + e; where e; ~ M (0, o?) and E [e;£;] = 0. 


1. Write this model in the matrix form: Y = X + e. Compute the sum of squared 
residuals e! e. Deduce that the least squares estimator 8 = arg mine' e is the solution 
of a quadratic programming problem. 


2. We assume that the linear model does not contain an intercept. 


(a) Show that the residuals are not centered. 


(b) Write the quadratic programming problem associated to the least squares esti- 
mator if we impose that the residuals are centered. 


(c) Transform the previous optimization problem with explicit constraints into an 
optimization problem with implicit constraints. Deduce the analytical solution. 


10.3.3 Linear regression with linear constraints 
1. We consider the linear regression: 
Y=X64+U 


(a) Let RSS (8) = U'U denote the residual sum of squares. Calculate RSS (3) with 
respect to Y, X and £. 


(b) Deduce the OLS estimator: 


8 = arg min RSS (8) 


(c) We assume that U ~ N (0,07J,). Show that Ê is an unbiased estimator. Deduce 
the variance of 8. 


2. We now introduce a system C of constraints defined by: 


_f AB=B 
e={ CB>D 
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TABLE 10.14: Numerical example 


28.88 23.15 20.75 21.60 22.97 100.53 
23.15 35.01 24.73 25.14 24.73 136.62 
X'™X = | 20.75 24.73 28.23 22.42 21.63 X'Y = 128.20 
21.60 25.14 22.42 32.22 24.17 146.07 
22.97 24.73 21.63 24.17 33.10 117.01 


(a) Show that the constrained estimator 2 is the solution of a quadratic programming 
problem. 


(b) We consider the numerical example given in Table 10.14. 
i. Calculate 6 when ya Be= 1. 
ii. Calculate 8 when 81 = b2 = Bs. 
iii. Calculate @ when 8; > b2 > B3 > Bs > Bs. 
iv. We assume that 61 < By < b3 < Ba < Bs and S~?_, bi = 1. Verify that: 


Deduce the values of the Lagrange coefficients for the inequality constraints 
given that the Lagrange coefficient of the equality constraint is equal to 
—192.36304. 


3. We assume that AG = B. 


(a) Write the Lagrange function and deduce the constrained OLS estimator Â. 


(b) Show that we can write the explicit constraints A8 = B into implicit constraints 
B = Cy + D. Write the constrained residual sum of squares RSS (y). Deduce the 
expressions of the estimator 7 and the constrained estimator (3. 


(c) Verify the coherence between the two estimators. 
(d) Verify that we obtain the same estimates in the case 6; = 82 and bı = Bs +1. 


10.3.4 Maximum likelihood estimation of the Poisson distribution 
We consider a sample Y = {y1,..., Yn} generated by the Poisson distribution P (A). 
1. Find the MLE of å. 


2. Calculate the information matrix Z (A). Deduce the variance of Â. Compare this ex- 
pression with the direct computation based on the Hessian matrix. 


Statistical Inference and Model Estimation 707 


10.3.5 Maximum likelihood estimation of the exponential distribution 


We consider a sample Y = {y1,...,Yn} generated by the Exponential distribution £ (A). 


1. Find the MLE of å. 


2. Calculate the information matrix Z (A). Deduce the variance of Â. Compare this ex- 
pression with the direct computation based on the Hessian matrix. 


10.3.6 Relationship between the linear regression and the maximum like- 
lihood method 


We consider the standard linear regression model: 
yi =x] b+ ui 
where u; ~ N (0,07). 
1. Write the log-likelihood function £ (0) associated to the sample Y = (y1,..., Yn). 
2. Find the ML estimator 6. What is the relationship between Over, and Bois? 


3. Compute var (4.01). 


10.3.7 The Gaussian mixture model 


We consider the mixture Y of two independent Gaussian random variables Yı and Y>. 
The distribution function of Y is: 


f(y) = mi fi (y) + T2 f2 (y) 
where Yı ~ N (m,0?), Yo ~ N (u2,02) and m +77 =1. 


1. Show that: 


 [Y*] = mE [YF] + mE [Y3] 


2. Deduce E [Y] and var (Y). 


3. Find the expression of the skewness coefficient 7; (Y). 


10.3.8 Parameter estimation of diffusion processes 


We consider a sample X = {2%9,21,...,x7} of the diffusion process X (t), which is 
observed at irregular times t = {to,t1,...,t7}. Therefore, we have x; = X (t,). 


1. Find the expression of the log-likelihood function associated to the geometric Brownian 
motion: 
dX (t) = uX (t) dt + 0X (t) dW (t) 


2. Same question with the Ornstein-Uhlenbeck process: 


dX (t) = a (b — X (t)) dt + o dW (t) 
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3. We consider the general SDE: 
dx (t) = u(t, X (t)) dt + a(t, X (t)) dW (t) 


Calculate the log-likelihood function associated to the Euler-Maruyama scheme. Apply 
this result to the Cox-Ingersoll-Ross process: 


dX (t) =a (b — X (t)) dt + o / X Œ) aW (t) 


Compare this approach with the quasi-maximum likelihood (QML) estimation by 
assuming conditional normality of innovation processes. 


4. Find the orthogonality conditions of the GMM approach for the GBM, OU and CIR 
processes. 


5. Same question if we consider the CKLS process (Chan et al., 1992): 


dX (t) = a (b — X(t) dt + o |X (t)|" aW (t) 


10.3.9 The Tobit model 


1. Give the first two moments of the truncated random variable X | X > c where 
X ~N (p,07). 


2. Calculate the first two moments of the censored random variable Y = max (X, c). 


3. Illustrate the difference between truncation and censoring when the parameters are 
b@=2,0=3andc=1. 


We consider the Tobit model defined as follows: 


ys = max (0, y7) 
yy =a] Btu 


where uj ~ N (0, a”), We have a linear model between the latent variable y% and K ex- 
planatory variables x;. However, we do not directly observe the data {x;, yx}. Indeed, we 
observe the data {x;,y;}, implying that the dependent variable y; is censored. 


4. Write the log-likelihood function £(@) where 6 = (8,07). 
5. Find the first-order conditions of the ML estimator 6. 
6. Calculate the Hessian matrix H (0) associated to the log-likelihood function £ (0). 


7. Show that the information matrix has the following representation (Amemiya, 1973): 


T(0) = dist ititi dict iTi ) 
@) ( int biti iat Si 


where a;, b; and c; are three scalars to define. 


8. Show that the OLS estimator based on non-censored data is biased. 


9. Compute the conditional expectations E [y; | y; > 0] and Ely; | yi < 0], and the un- 
conditional expectation E [y;]. Propose an OLS estimator 8 and compare it with the 
ML estimator £. 
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10. We consider the data given in Table 10.15. Using the method of maximum likelihood, 
estimate the Tobit model: 


yi = max (0, y% ) 
yz = Bo + Bi via + 2x21 + Ui 


where uj ~ N (0, a). Calculate the OLS estimates based on the non-censored data. 
Verify that: 


B(OLS) _ 6 (ML) (x! X1) x] AML) _ Bint) 


and: 


5 . 1 4 
B(OLS) _ @(ML) (x] Xı) x] Mi 4 AML) 


Comment on these results. 


11. How to calculate the predicted value ğž given that we know if the observation is 
censured or not? Compare the numerical value of y¥ with the unconditional predicted 
value YF. 


TABLE 10.15: Data of the Tobit example 


i I 2 3 4 5 6 7 8 9 10 
y 40 00 05 00 00 174 180 00 00 97 
zii —4.3 -9.2 —28 —27 -84 20 53 -81 0.9 -78 
zə; —1.2 -55 18 -34 29 93 91 68 -63 43 

i ll 2 B #4 #15 #16 17 +~ «18 +~ 19 20 


10.3.10 Derivation of Kalman filter equations 


We consider the standard state space model described on page 647: 


Ye = Za, + di t+ € 
ay = Tiat- +e + Rin 


1. Show that the prior estimates are given by the following relationships: 


Ĝtt-1 = Tiât-ijt-1 + Ct 
P1 = TP, ije-1T; + RQR 


2. Deduce that the innovation v; = y+ — E+—1 [y+] is a centered Gaussian random vector, 
whose covariance matrix F; is equal to: 


F; =2Z;Pi41Z) + Hi 
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3. Show that the joint distribution of the random vector (az, v+) conditionally to the 
filtration F,_, is equal to: 


Qt 3 N Ôtjt—1 Pijt—1 Pyt-1 Ze 
Ut 0 "\ Pit- Fi 


4. Deduce that: 


Que =E [ t—1 [az] | Ut = Ut — Zi Qijt-1 = di] 
and: 
Aye = Aya + PieaZs Fi, (u — Ztâtt-1 — di) 
Pye = Pyri — Pai! Fy Ze Pai 


5. Summarize the equations of the Kalman filter. Deduce that there exists a matrix K;, 
such that: 
tijt = Tei Qee-1 + Cet + Kivi 


Rewrite the state space model as an innovation process. What is the interpretation of 
Ky? 


6. Show that the state space model can be written as: 


Y= Zro% 
af = Tag + Ring 


Define the state vector af, the matrices Z*, Tř and Rf, and the random vector nf. 


7. Deduce the Kalman filter when the white noise process e; and m are correlated: 


) fen | = Ci 


10.3.11 Steady state of time-invariant state space model 


We note e; ~ N (0,02). 
1. Calculate the steady state of the SSM associated to the AR(1) process: 
Yt = U + biye-1 + Et 
2. Calculate the steady state of the SSM associated to the MA (1) process: 
Yt = HU + Et — O1Et-1 
3. Calculate the steady state of the SSM associated to the ARMA(1,1) process: 
Yt = H + 1Yt—1 + Et — M1Er-1 
4. Calculate the steady state of the SSM associated to the process: 


Ye = U+rue 
Ut, = H ir + €¢ 
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10.3.12 Kalman information filter versus Kalman covariance filter 


We assume that all square matrices are invertible and all non-square matrices have a 
Moore-Penrose pseudo-inverse. We note A, B, C and D four matrices of dimension m x m, 
nxm,nxnandm xm. 


1. Show that: i i 
(In +AB'C7'B) A=(A`+B'C™B) 
2. Verify the following relationship: 


—1 


(Im + ABTO-!B)™ = Im — ABT (C+ BAB") B 


3. Deduce that: 


(Im + ABTC-!B)* ABTO-' = ABT (C+ BAB") 


4. Calculate (Im + D~1A) (A+ D)7" 
We consider the following state space model: 


Yt = Zitat + & 
ar = Tiat + Rim 


It correspond to a special case of the model described on page 647 when there are no 
constants c; and d; in state and measurement equations. 


5. Define the concept of information matrix. 


6. We introduce the notations Ij, = Pro Iya = Pay È eos = 


A : d Sy > 
Tyjz-1@ejz-1- How do you interpret the vectors Orie and 4 Âi- 


= Lijtâtt and Ore 1 


7. By using the results of Questions 1-4, show that: 
Tye Pejt—1 = Im + Zil Hy ZP- 


Deduce that: 


1 = 
Tee Peje-1Zy (Z: Pit-1Z + A) = Zi H, : 


8. Verify that the recursive equations of the information filter are: 


Tee 1= (Ta ilt- Te + RQR) 


âile- 1 = le- 1T 1jt—1 âi- 1|t—1 
Tee = Mee 1+ Z Hy es, 
ar + Z Ap 


tt — Oe 1 


What advantages would you see in using the Kalman information filter rather than 
the Kalman covariance filter? 


9. We assume that the probability distribution of ao is diffuse. Give the log-likelihood 
function of the sample {y1,...,yr} when considering the Kalman information filter. 
How to take into account the diffuse assumption when considering the Kalman co- 
variance filter? 
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10.3.13 Granger representation theorem 
We assume that y; is a VaR(p) process: 
® (L) ye = pe + €r 


where: 
® (L)=1,-,L—...-—O,)" 


1. We consider the case p = 1. Show that: 
Ay: = pu + (® — In) yea + Er 
2. We consider the case p = 2. Show that: 
Ayt = pt + ($1 + $3 — In) ye-1 — PAY- + Er 


3. Verify that the expression of Ay; in the general case is equal to: 


p-1 
Ay: = we + Myt- + 5 P;Ayri + E 
i=l 


where II = —®’ (1) = }7?_, ® — In and ©; = — X p; 


10.3.14 Probability distribution of the periodogram 


Let y+ be a stationary centered process. We decompose the periodogram as the sum of 
two parts: 
1 Š Í 
LA = —tAjt 
y ( ) Sri, 2 ue 
a? (Aj) +b? (Aj) 
27 


where a(A;) = n712 Y7] ye cos (Azt) and b(A;) = nt 3, yesin (Ajt). 
1. We assume that y, ~ N (0,07). Show that®: 
. o? 
Jim an (Aj) ~N (o _ 
and: 
. a? 
Jim bn (Aj) ~ N (o T) 


2. Verify that a (Aj) and b (Aj) are asymptotically independent. Deduce the probability 
distribution of I, (A;). 


60We recall that: 


n 

1 

li — t) |] =0 

lin. (2 S cos(a ) 
t=1 

when a # 0. 
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3. More generally, we assume that limny.2f7 1! (Aj) Iy (Aj) ~ x3 for any stationary 
centered process y;. Calculate the first two moments of T} (A;) and the 95% confidence 
interval of fy (A;). 


4. We consider Question 2 when A; = 0. What is the probability distribution of 7, (0)? 
Formulate an hypothesis about the probability distribution of Ty (0) for all stationary 
centered process y+. Show that lim,-,.. E [I, (0)] = fy (0) and limn—=oo var (Ty (0)) = 
2f7 (0). 


10.3.15 Spectral density function of structural time series models 


We consider the following models: 


(M1) 
Ye = be + Et 
Ht = Me-1 + | 
(M2) 
Ye = be + Et 
Ut = Me-1 + bt-1 t+ | 
Br = bit-t G 
(M3) 
Ye = ht + Be + Ve + Et 
Me = Me-1 + | 
bt = OFt-1+¢ 
s—1 
32 Vt-i = Wt 
i=0 


where £+, 7, ¢; and w; are independent white noise processes with variances o2 , o2, o? and 


2 
We 


o 
1. Write Models (M1) and (M2) in the state space form. 


2. Find the stationary form of these two processes and calculate their spectral density 
function. 


3. Illustrate graphically the difference between these spectral density functions. 
4. We consider Model (M3). Give an interpretation of the components py, Bt and yt. 
5. Show that a stationary form of y is: 


z = (1 — L) (1 — L°) y: 


6. Give another stationary form of y. 


7. Find the spectral density function of z4. 
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10.3.16 Spectral density function of some processes 
Calculate the spectral density function of the following processes: 
1. y: is a periodic random walk process: 
Yt = Yt-s + Et 
where s > 1 and e; ~ N (0,02). 
2. yz is a fractional white noise process: 
(he L)’ Yt = Et 
where e; ~ N (0,02). 
3. z is the sum of an AR(1) process and a MA(1) process: 
Zt = Tt H Yt 
Li = PLy-1 + U 
Yt = Vt — vı 
where uw ~ N (0,02), ve ~ N (0,02) and uw L uw. 


(a) Simulate a sample z of 1 000 observations with the following parameter: ¢ = 0.75, 
0 = 0.2, o, = 1 and o, = 0.5. Draw the periodogram of z. 


(b) Estimate the parameters ¢, Cu, 0 and o, by using the method of Whittle. Com- 
pare the estimated spectral density function with the periodogram of z and the 
theoretical spectral density function. 
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Copulas and Dependence Modeling 


One of the main challenges in risk management is the aggregation of individual risks. We can 
move the issue aside by assuming that the random variables modeling individual risks are 
independent or are only dependent by means of a common risk factor. The problem becomes 
much more involved when one wants to model fully dependent random variables. Again a 
classic solution is to assume that the vector of individual risks follows a multivariate normal 
distribution. However, all risks are not likely to be well described by a Gaussian random 
vector, and the normal distribution may fail to catch some features of the dependence 
between individual risks. 


Copula functions are a statistical tool to solve the previous issue. A copula function is 
nothing else but the joint distribution of a vector of uniform random variables. Since it is 
always possible to map any random vector into a vector of uniform random variables, we 
are able to split the marginals and the dependence between the random variables. There- 
fore, a copula function represents the statistical dependence between random variables, and 
generalizes the concept of correlation when the random vector is not Gaussian. 


11.1 Canonical representation of multivariate distributions 


The concept of copula has been introduced by Sklar in 1959. During a long time, only 
a small number of people have used copula functions, more in the field of mathematics 
than this of statistics. The publication of Genest and MacKay (1986b) in the American 
Statistician marks a breakdown and opens areas of study in empirical modeling, statistics 
and econometrics. In what follows, we intensively use the materials developed in the books 
of Joe (1997) and Nelsen (2006). 


11.1.1 Sklar’s theorem 


Nelsen (2006) defines a bi-dimensional copula (or a 2-copula) as a function C which 
satisfies the following properties: 


1. Dom C = (0, 1] x [0, 1]; 
2. C(0,u) = C (u,0) = 0 and C(1,u) = C(u, 1) = u for all u in [0,1]; 
3. C is 2-increasing: 
C (v1, v2) — C (v1, u2) — C (ur, v2) + C (ur, ua) 2 0 
for all (u1, u2) € [0, 1]?, (v1, v2) € [0,1]? such that 0 < u1 < vı < 1 and 0 < u2 < v2 < 


1. 
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This definition means that C is a cumulative distribution function with uniform marginals: 
C (ur, u2) = Pr {Ui < u1, U2 < u2} 

where U and U2 are two uniform random variables. 


Example 108 Let us consider the function C+ (u1,u2) = urug. We have C+ (0,u) = 
Ct (u,0) = 0 and Ct (1,u) = C+ (u,1) = u. Since we have vz — uz > 0 and vı > u, it 
follows that vı (v2 — u2) > uy (v2 — u2) and vvo + urug — uv — viu > 0. We deduce that 
C~ is a copula function. It is called the product copula. 


Let F; and Fz be any two univariate distributions. It is obvious that F (z1, £2) = 
C (Fi (x1), F2(x2)) is a probability distribution with marginals F; and Fə. Indeed, 
ui = F;(zx;) defines a uniform transformation (u; € [0,1]). Moreover, we verify that 
C (Fi (#1) , Fo (co)) = C (Fi (21) , 1) = F; (21). Copulas are then a powerful tool to build a 
multivariate probability distribution when the marginals are given. Conversely, Sklar (1959) 
proves that any bivariate distribution F admits such a representation: 


F (x1, £2) = C (F: (x1), F2 (a2)) (11.1) 


and that the copula C is unique provided the marginals are continuous. This result is 
important, because we can associate to each bivariate distribution a copula function. We 
then obtain a canonical representation of a bivariate probability distribution: on one side, 
we have the marginals or the univariate directions F; and F2; on the other side, we have the 
copula C that links these marginals and gives the dependence between the unidimensional 
directions. 


Example 109 The Gumbel logistic distribution is the function F (x1, £2) = (1 + e7®! + e72)! 
defined on R2. We notice that the marginals are F; (x1) = F (1,00) = (1+e7®)~* and 
F (£2) = (1+e7%) |. The quantile functions are then F7 (ui) = lnu; — ln (1 — u1) and 
FZ‘ (u2) = ln uz — In (1 — uz). We finally deduce that: 
= x uu 
C (ur, u2) = F (FT* (u1), F3" (u2)) = — 


ui + Ug — UU? 


is the Gumbel logistic copula. 


11.1.2 Expression of the copula density 


If the joint distribution function F (21,22) is absolutely continuous, we obtain: 


f(v1,t2) = O12F (21,22) 
= O12C(Fi (x1), F2 (x2)) 
= c(Fi (x1), F2 (£2)) - fı (x1) - fo (x2) (11.2) 


where f (#1, £2) is the joint probability density function, fı and f> are the marginal densities 
and c is the copula density: 

c (u1, U2) = O1,2 C (u1, u2) 
We notice that the condition C (v1, v2) — C (v1, u2) — C (wi, v2) + C (u1, u2) > 0 is then 
equivalent to 0,2 C (u1, u2) > 0 when the copula density exists. 


Example 110 In the case of the Gumbel logistic copula, we obtain c(u,,u2) = 
2u1u2/ (u1 + u2 — u1u2)?. We easily verify the 2-increasing property. 
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From Equation (11.2), we deduce that: 


f (Fy? (u), Fa? (u2)) 
fi (ET' (u1)) - fo (Fa? (u2)) 
We obtain a second canonical representation based on density functions. For some copulas, 
there is no explicit analytical formula. This is the case of the Normal copula, which is 
equal to C (u1, u2; p) = ® (7! (u1), $7! (u2); p). Using Equation (11.3), we can however 
characterize its density function: 


Qn (1 — py? exp ( TEETE (x? + z3 2px) 


c(u1, U2) = (11.3) 


c(u1,u2;p) = = = 
(27) 12 exp (—4x?) - (27) ne exp (—523) 
1 1 (at +23 — 2px x2) l 2, 2 
= eF exp ( 2 (1 — p?) T 2 (x? T x2) 


where zı = Fj‘ (u1) and z3 = F3 ' (ug). It is then easy to generate bivariate non-normal 
distributions. 


Example 111 In Figure 11.1, we have built a bivariate probability distribution by consid- 
ering that the marginals are an inverse Gaussian distribution and a beta distribution. The 
copula function corresponds to the Normal copula such that its Kendall’s tau is equal to 
50%. 


Density of F,(x;) — 1G(2,1.5) Density of Fo(x2) — B(2,2) 


0.5 


0.0 


0.00 0.25 0.50 0.75 1.00 
X2 


Density of C(Fy(x;),Fo(x2)) 


FIGURE 11.1: Example of a bivariate probability distribution with given marginals 


11.1.3 Fréchet classes 


The goal of Fréchet classes is to study the structure of the class of distributions with 
given marginals. These latter can be unidimensional, multidimensional or conditional. Let 
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us consider the bivariate distribution functions F12 and F23. The Fréchet class F (F12, F23) 
is the set of trivariate probability distributions that are compatible with the two bivari- 
ate marginals F,2 and F23. In this handbook, we restrict our focus on the Fréchet class 
F (Fi,...,F,) with univariate marginals. 


11.1.3.1 The bivariate case 


Let us first consider the bivariate case. The distribution function F belongs to the Fréchet 
class (F1, F2) and we note F € F (Fj, F2) if and only if the marginals of F are F; and Fo, 
meaning that F (#1,00o) = F; (a1) and F (00,22) = Fə (x2). Characterizing the Fréchet 
class F (F1, F2) is then equivalent to find the set C of copula functions: 


F (Fi, F2) = {F ; F (x1, £2) = C (F: (x1) , Fe (a2)) ,C E C} 


Therefore this problem does not depend on the marginals F; and F2. 


We can show that the extremal distribution functions F~ and Ft of the Fréchet class 
F (F4, F2) are: 
FEF (21, £2) = max (F: (x1) + F> (x2) = 1, 0) 


and: 
Fr (x1, x2) = min (Fy (21) , Fo (x2)) 


F- and F* are called the Fréchet lower and upper bounds. We deduce that the correspond- 
ing copula functions are: 


C~ (ui, u2) = max (u1 + u2 — 1,0) 


and: 
Ct (u1, u2) = min (u1, u2) 


Example 112 We consider the Fréchet class F (Fi,F2) where Fi ~ N (0,1) and Fo ~ 
N (0,1). We know that the bivariate normal distribution with correlation p belongs to 
F (E1, F2). Nevertheless, a lot of bivariate non-normal distributions are also in this Fréchet 
class. For instance, this is the case of this probability distribution: 


@ (x1) ® (x2) 


F (e122) = SC) + 6 (a) — 6 (0) a) 


We can also show that!: 
FO (a1, 22) := ® (x1, £2; —1) = max (® (x1) + ® (x2) — 1,0) 


and: 
F+ (21,22) := ® (z1, £2; +1) = min (® (x1) , ® (x2)) 


Therefore, the bounds of the Fréchet class F (N (0,1), N (0,1)) correspond to the bivariate 


normal distribution, whose correlation is respectively equal to —1 and +1. 


1We recall that: 


z1 T2 
venes f J $ (yi, y2; p) dyı dy2 
-00 =i 
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11.1.3.2 The multivariate case 


The extension of bivariate copulas to multivariate copulas is straightforward. Thus, the 
canonical decomposition of a multivariate distribution function is: 


F (£1,..., £n) =C(F, (zi) Pa lta) 


We note Ce the sub-copula of C such that arguments that are not in the set € are equal to 1. 
For instance, with a dimension of 4, we have C12 (u,v) = C (u,v, 1,1) and Ci24 (u, v, w) = 
C (u,v, 1, w). Let us consider the 2-copulas C; and C9. It seems logical to build a copula of 
higher dimension with copulas of lower dimensions. In fact, the function Cy (u1, C2 (u2, us)) 
is not a copula in most cases (Quesada Molina and Rodriguez Lallena, 1994). For instance, 
we have: 


Co (u1,C™ (u2,u3)) = max (wu; + max (uz + ug — 1,0) — 1,0) 


= max (u + u2 + uz — 2,0) 


C™ (u1, u2, u3) 


However, the function C7 (u1, u2, u3) is not a copula. 


In the multivariate case, we define: 


and: 
CP (u1, ..., Un) = min (u1,..., Un) 


As discussed above, we can show that Ct is a copula, but C~ does not belong to the set 
C. Nevertheless, C~ is the best-possible bound, meaning that for all (u1,..., un) € [0,1]”, 
there is a copula that coincide with C7 (Nelsen, 2006). This implies that F (Fi,...,F,,) has 
a minimal distribution function if and only if max ()7j_, F; (zi) — n + 1,0) is a probability 
distribution (Dall’Aglio, 1972). 


11.1.3.3 Concordance ordering 


Using the result of the previous paragraph, we have: 
Om (u1, U2) < C (u1, U2) < ct (u1, U2) 


for all C € C. For a given value a € [0,1], the level curves of C are then in the triangle 
defined as follows: 


{(u1, u2) : max (u1 + u2 — 1,0) < a, min (u1, u2) > a} 


An illustration is shown in Figure 11.2. In the multidimensional case, the region becomes a 
n-volume. 


We now introduce a stochastic ordering on copulas. Let Cı and Cy be two copula 
functions. We say that the copula C; is smaller than the copula Cə and we note C, < C2 
if we verify that Cy (u1, u2) < Ce (u1, u2) for all (u1, u2) € (0, ig This stochastic ordering 
is called the concordance ordering and may be viewed as the first order of the stochastic 
dominance on probability distributions. 
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FIGURE 11.2: The triangle region of the contour lines C (u1, u2) = a 


Example 113 This ordering is partial because we cannot compare all copula functions. Let 
us consider the cubic copula defined by C (u1, u2; 0) = u1u2+0 [u(u — 1)(2u — 1)] [v(v — 1)(2v — 1)] 
where 0 € [—1,2]. If we compare it to the product copula C+, we have: 


3 3 3.3 
C( —,~:1) = 0.5712 > C+ [ =,= ) = 0.562 
(32) 0.5712 > C G3) 0.5625 


3 1 3 1 
— =f =U. < L —, — == 
o(2.51) 0.1787 < C (3.3) 0.1875 


Using the Fréchet bounds, we always have C7 < C+~ Ct. A copula C has a positive 
quadrant dependence (PQD) if it satisfies the inequality Ct < C < Ct. Ina similar way, C 
has a negative quadrant dependence (NQD) if it satisfies the inequality C7 < C < C+. As 
it is a partial ordering, there exist copula functions C such that C Y C+ and C 4 Ct. A 
copula function may then have a dependence structure that is neither positive or negative. 
This is the case of the cubic copula given in the previous example. In Figure 11.3, we report 
the cumulative distribution function (above panel) and its contour lines (right panel) of the 
three copula functions C7, C+ and C+, which plays an important role to understand the 
dependance between unidimensional risks. 


Let Cog (ui, u2) = C (u1,u2;0) be a family of copula functions that depends on the 
parameter 0. The copula family {Cg} is totally ordered if, for all 62 > 61, Co, > Co, 
(positively ordered) or Cg, < Cg, (negatively ordered). For instance, the Frank copula 


defined by: 
—O0u1 _ 1 —Ou2 _ 1 
C (u1, ug; 0) = pif: | (p Li ) 


where 0 € R is a positively ordered family (see Figure 11.4). 
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FIGURE 11.3: The three copula functions C7, C+ and C+ 


C(u41,u2) = 0.1 
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FIGURE 11.4: Concordance ordering of the Frank copula 


722 Handbook of Financial Risk Management 


Example 114 Let us consider the copula function Co = 0-C~ + (1 — 0) - C+ where 0 < 
0 <1. This copula is a convex sum of the extremal copulas C~ and Ct. When 02 > 01, we 
have: 


Co, (u1, U2) = 02 ë om (u1, U2) + (1 = 02) ¢ ct (u1, U2) 


= Co, (u1, u2) = (82 = 01) x (C+ (u1, U2) — (© (u1, u2)) 
< Co, (u1, ug) 


We deduce that Co, < Co,. This copula family is negatively ordered. 


11.2 Copula functions and random vectors 
Let X = (X1, X2) be a random vector with distribution F. We define the copula of 
(X1, X2) by the copula of F: 
F (x1, 22) = C (X1, X2) (Fi (21) , F2 (x2) 


In what follows, we give the main results on the dependence of the random vector X found 
in Deheuvels (1978), Schweizer and Wolff (1981), and Nelsen (2006). 


11.2.1 Countermonotonicity, comonotonicity and scale invariance prop- 
erty 


We give here a probabilistic interpretation of the three copula functions C7, C+ and 
Cr: 


e X, and Xə are countermonotonic — or C (X1, X2) = C7 - if there exists a random 
variable X such that X1 = fı (X) and X2 = fo (X) where fı and fo are respectively 
decreasing and increasing functions’; 


e X; and Xə are independent if the dependence function is the product copula C+; 


e X; are Xə are comonotonic — or C (X1, X2) = C7 — if there exists a random variable 
X such that Xı = fi (X) and X2 = fo(X) where fı and f2 are both increasing 


functions?. 
Let us consider a uniform random vector (U1, U2). We have Uz = 1 — U, when 
C (X1, X2) = C% and Ug = U, when C (Xj, X2) = C+. In the case of a standardized 
Gaussian random vector, we obtain Xə = —X, when C (Xj, X2) = C7 and Xo = Xj 


when C (X1, X2) = C*. If the marginals are log-normal, it follows that X> = X;' when 
C (X1, X2) = C% and X; = X, when C (X1, X2) = C*. For these three examples, we verify 
that Xə is a decreasing (resp. increasing) function of X; if the copula function C (X1, X2) is 
C- (resp. C+). The concepts of counter- and comonotonicity concepts generalize the cases 
where the linear correlation of a Gaussian vector is equal to —1 or +1. Indeed, C7 and Ct 
define respectively perfect negative and positive dependence. 


?We also have Xp = f (Xi) where f = foo fi is a decreasing function. 
3In this case, X2 = f (X1) where f = foo fe is an increasing function. 
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We now give one of the most important theorems on copulas. Let (X1, X2) be a random 
vectors, whose copula is C (X1, X2). If hı and ho are two increasing functions on Im X; and 
Im X2, then we have: 

C (hi (X1) , h2 (X2)) = C (X1, X2) 
This means that copula functions are invariant under strictly increasing transformations of 
the random variables. To prove this theorem, we note F and G the probability distributions 
of the random vectors (X1, X2) and (Yi, Y2) = (hı (X1) , h2 (X2)). The marginals of G are: 
Gı (yı) = Pr{Yi <y} 
Pr {hi (X1) < ys} 
= Pr{X, <hjy'(y1)} (because hı is strictly increasing) 
= Fi (hy *()) 
and Gə (y2) = Fə (hy " (y2)). We deduce that G7* (u1) = hi (F7' (u1)) and G3* (u2) = 
hə (Fz ‘ (u2)). By definition, we have: 
C (Ya, Y2) (u1, U2) =G (GI' (u1) ,G3' (u2)) 
Moreover, it follows that: 
G (Gy (u1) ; G3” (u2)) = Pi {Yj < GT (u1) , Y> < Gz- (u2)} 
= Pr {hy (X1) < GT' (u1) , he (X2) < GZ" (u2)} 
= Pr{X; < hy (GI (u)), X2 < hy! (G3' (u2))} 
= Pr{X, < FIt (u), X2 < Fy’ (u2)} 
=F (FT' (u), F3" (u2)) 
Because we have C (X1, X2) (u1, u2) = F (FIT+ (u1), Fz" (u2)), we deduce that C (Y1, Y2) = 
C (X1, X2). 


II 


Example 115 If Xı and Xə are two positive random variables, the previous theorem im- 
plies that: 
C (X1, X2) C (ln X1, X2) 
= C (ln X1, In Xə) 
c( 


X1, exp Xə) 
= C (VX exp X2) 


Applying an increasing transformation does not change the copula function, only the 
marginals. Thus, the copula of the multivariate log-normal distribution is the same than 
the copula of the multivariate normal distribution. 


The scale invariance property is perhaps not surprising if we consider the canonical 
decomposition of the bivariate probability distribution. Indeed, the copula C (U1, U2) is 
equal to the copula C (X1, X2) where U1 = F; (Xj) and U2 = F2 (X2). In some sense, Sklar’s 
theorem is an application of the scale invariance property by considering hy (x1) = Fy (z1) 
and ho (x2) = F2 (a2). 


Example 116 We assume that Xı ~ N (1,07) and Xə ~ N (12,03). If the copula of 
(X41, X2) is CT, we have Uz = 1 — U1. This implies that: 


Xə — Xı— 
o( 2 z) a 1-a( 1 =) 
02 O71 
Cae 
i) 
Oi 


II 
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We deduce that Xı and Xə are countermonotonic if: 
o 
Xə = po — 2 (Xı — m) 
O71 


By applying the same reasoning to the copula function C*, we show that X, and Xa are 
comonotonic if: 


fe 
X = po + 2 (Xı — m) 
Oi 


We now consider the log-normal random variables Yı = exp (X1) and Yz = exp (X2). For 
the countermonotonicity case, we obtain: 


oO 
In Yo = pm — — (In Yi — p) 
Oi 


or: 
Yə = exp (r + Zp ) p yn 
1 


For the comonotonicity case, the relationship becomes: 


Y= exp (r = 1) a yr 
1 


If we assume that pı = u2 and cı = o2, the log-normal random variables Yı and Y> are 
countermonotonic if Yo = Yr and comonotonic if Y> = Yj. 


11.2.2 Dependence measures 


We can interpret the copula function C (X1, X2) as a standardization of the joint dis- 
tribution after eliminating the effects of marginals. Indeed, it is a comprehensive statistic 
of the dependence function between X, and X2. Therefore, a non-comprehensive statistic 
will be a dependence measure if it can be expressed using C (X1, X2). 


11.2.2.1 Concordance measures 


Following Nelsen (2006), a numeric measure m of association between X, and X> is a 
measure of concordance if it satisfies the following properties: 


eer te ee ea ore ee ear 
rn Clas, 

3. m(-X1, X2) = m (X1, Se) = -m (X1, X2); 
4. if Cy ~ Cog, then m (C1) < m (C2); 


Using this last property, we have: C < C+ + m (C) < 0 and C > C+ => m (C) > 0. The 
concordance measure can then be viewed as a generalization of the linear correlation when 
the dependence function is not normal. Indeed, a positive quadrant dependence (PQD) 
copula will have a positive concordance measure whereas a negative quadrant dependence 
(NQD) copula will have a negative concordance measure. Moreover, the bounds —1 and +1 
are reached when the copula function is countermonotonic and comonotonic. 

Among the several concordance measures, we find Kendall’s tau and Spearman’s rho, 
which play an important role in non-parametric statistics. Let us consider a sample of 
n observations {(21,y1),---,(@n,Yn)} of the random vector (X,Y). Kendall’s tau is the 
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probability of concordance — (X; — X;)-(Y; — Y;) > 0 - minus the probability of discordance 
— (Xi — X5)- (Yı — Yj) < 0: 

T = Pr{(Xi — X;)- (Yı — Yj) > 0} — Pr{(Xi — Xj) - (Yi — Yj) < 0} 
Spearman’s rho is the linear correlation of the rank statistics (Xj:n, Yin). We can also show 
that Spearman’s rho has the following expression: 

p= 20¥ (Ex (X) Fy (Y) 
a (Fx (X)) -o (Fy (Y)) 


Schweizer and Wolff (1981) showed that Kendall’s tau and Spearman’s rho are concordance 
measures and have the following expressions: 


T = aff C (ur, U2) dC (ui, u2) — 1 
[0,1]? 


12 n uU1u? dC (u1, U2) —3 
[0,1]? 


From a numerical point of view, the following formulas should be preferred (Nelsen, 2006): 


© 
II 


7 = 1 -4 Jf Ou, C (u1, U2) Ou, C (u1, U2) dur dug 
[0,1]? 


2 ff C (u1, u2) du, dus — 3 
[0,1]? 


For some copulas, we have analytical formulas. For instance, we have: 


© 
II 


Copula o T 

Normal 6r™t arcsin (p/2) 27} arcsin (p) 

Gumbel v (@—1)/0 
FGM 6/3 20/9 


Frank 1-— 120-! (D; (6) — Də (0)) 1-407! (1 — D: (0)) 


where D; (x) is the Debye function. The Gumbel (or Gumbel-Hougaard) copula is equal to: 
6 0 1/0 
C (u1, U2; 6) = exp | — [(- mu) + (—In uz) 


for 0 > 1, whereas the expression of the Farlie-Gumbel-Morgenstern (or FGM) copula is: 
C (u1, U2; 0) = U1, U2 (1 + 0 (1 = u1) (1 = u2)) 


for —1 <0 <1. 

For illustration, we report in Figures 11.5, 11.6 and 11.7 the level curves of several density 
functions built with Normal, Frank and Gumbel copulas. In order to compare them, the 
parameter of each copula is calibrated such that Kendall ’s tau is equal to 50%. This means 
that these 12 distributions functions have the same dependence with respect to Kendall’s 
tau. However, the dependence is different from one figure to another, because their copula 
function is not the same. This is why Kendall’s tau is not an exhaustive statistic of the 
dependence between two random variables. 


We could build bivariate probability distributions, which are even less comparable. In- 
deed, the set of these three copula families (Normal, Frank and Gumbel) is very small 
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N(0,1) 1G(2,1.5) 


FIGURE 11.5: Contour lines of bivariate densities (Normal copula) 


1G(2,1.5) 


FIGURE 11.6: Contour lines of bivariate densities (Frank copula) 
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N 
w 


1G(2,1.5) 


FIGURE 11.7: Contour lines of bivariate densities (Gumbel copula) 


compared to the set C of copulas. However, there exist other dependence functions that are 
very far from the previous copulas. For instance, we consider the region B (r, o) defined by: 


(87 —1)/2<o<(14+27-—77)/2 if r>0 
(7.0) Bir) e d Pe ee if r<0 


Nelsen (2006) shows that these bounds cannot be improved and there is always a copula 
function that corresponds to a point of the boundary B (r, o). In Figure 11.8 we report 
the bounds B(r, o) and the area reached by 8 copula families (Normal, Plackett, Frank, 
Clayton, Gumbel, Galambos, Hiisler-Reiss, FGM). These copulas covered a small surface 
of the T — @ region. These copula families are then relatively similar if we consider these 
concordance measures. Obtaining copulas that have a different behavior requires that the 
dependence is not monotone* on the whole domain [0, 1]?. 


11.2.2.2 Linear correlation 


We recall that the linear correlation (or Pearson’s correlation) is defined as follows: 


5 [Xa - X2] — E [X1] - E [X2] 
a (Xi) +o (X2) 


p(X, X2) = 


Tchen (1980) showed the following properties of this measure: 


e if the dependence of the random vector (X1, X2) is the product copula C+, then 
P (Xı, X2) E 0; 


e p is an increasing function with respect to the concordance measure: 


Cı > Co > pı (X1, X2) = p2 (X1, X2) 


“For instance, the dependence can be positive in one region and negative in another region. 
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Area attained with 
the 8 copula families 


T 
FIGURE 11.8: Bounds of (r, o) statistics 
e p(X, X2) is bounded: 
P~ (X1,X2) < p(X1,X2) < pt (Xa, X2) 
and the bounds are reached for the Fréchet copulas C7 and Cr. 
However, a concordance measure must satisfy m (C7) = —1 and m (C+) = +1. If we use 


the stochastic representation of Fréchet bounds, we have: 


E [A (X); f2 (X)] — E [J1 (X)] Elf (X)] 
a (fi (X)) -o (f2 (X)) 


The solution of the equation p~ (X1, X2) = —1 is fi (x) = aya + bı and fo (x) = aga + bz 
where ajag < 0. For the equation pt (X1, X2) = +1, the condition becomes ajag > 0. 
Except for Gaussian random variables, there are few probability distributions that can 
satisfy these conditions. Moreover, if the linear correlation is a concordance measure, it is 
an invariant measure by increasing transformations: 


p(X1, X2) = p (fi (X1), f2 (X2)) 


Again, the solution of this equation is fı (x) = aix + bı and fə (x) = aga + bo where 
a a2 > 0. We now have a better understanding why we say that this dependence measure 
is linear. In summary, the copula function generalizes the concept of linear correlation in a 
non-Gaussian non-linear world. 


p (Xi, X2) = pt (X1, X2) = 


Example 117 We consider the bivariate log-normal random vector (X1, X2) where Xı ~ 
LN (111,07), X2 ~ LN (2,03) and p = p(n X1,In X3). 


We can show that: 
pioi + p303 
2 


E[XP X3] = exp (pm F p22 + ! pıpapo1o2) 
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It follows that: 
exp (po102) — 1 
exp (a?) — 1- \/exp (o2) — 1 


We deduce that p (X1, X2) € [p~, pt], but the bounds are not necessarily —1 and +1. For 
instance, when we use the parameters c1 = 1 and o2 = 3, we obtain the following results: 


p (X1, X2) = 


Copula p(Xı, Xə) T(X1ı, X2) o(X1, X2) 


C~ —0.008 —1.000 —1.000 
p=—0.7 —0.007 —0.494 —0.683 
C~ 0.000 0.000 0.000 
p=0.7 0.061 0.494 0.683 
cr 0.162 1.000 1.000 


When the copula function is C7, the linear correlation takes a value close to zero! In 
Figure 11.9, we show that the bounds p~ and p* of p(X 1, X2) are not necessarily —1 and 
+1. When the marginals are log-normal, the upper bound pt = +1 is reached only when 
0, = 02 and the lower bound p~ = —1 is never reached. This poses a problem to interpret 
the value of a correlation. Let us consider two random vectors (X1, X2) and (Y1, Y2). What 
could we say about the dependence function when p (X1, X2) > p (Yi, Y2)? The answer is 
nothing if the marginals are not Gaussian. Indeed, we have seen previously that a 70% 
linear correlation between two Gaussian random vectors becomes a 6% linear correlation if 
we apply an exponential transformation. However, the two copulas of (X1, X2) and (Y1, Y2) 
are exactly the same. In fact, the drawback of the linear correlation is that this measure 
depends on the marginals and not only on the copula function. 


o, = 1 


— Upper bound pt 
=-= Lower bound p~ 


FIGURE 11.9: Bounds of the linear correlation between two log-normal random variables 
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11.2.2.3 Tail dependence 


Contrary to concordance measures, tail dependence is a local measure that charac- 
terizes the joint behavior of the random variables X, and Xə at the extreme points 
x = inf {x : F (x) > 0} and xt = sup {z : F(x) < 1}. Let C be a copula function such 
that the following limit exists: 


1 — 2u + C (u, u) 


u—>1- l—u 


We say that C has an upper tail dependence when At € (0,1] and C has no upper tail 
dependence when At = 0 (Joe, 1997). For the lower tail dependence A7, the limit becomes: 


= C (u, u) 


u>0t u 


We notice that A+ and AT can also be defined as follows: 


At = lim Pr{U; >u |U >u} 
u—1- 


and: 
A` = lim Pr{U, <u|U, <u} 
u>0t 


To compute the upper tail dependence, we consider the joint survival function C defined 
by: 


C (u1, u2) = Pr {U1 > u1, U2 > ua} 
= 1 — uy — u2 + C (u1, u2) 


The expression of the upper tail dependence is then equal to: 


At = lim ——— 


= — lim (-2+0,C(u,u) + ôC (u, u)) 


u=>1-~ 


= lim (Pr{U; > u |U = u} + Pr{U; > u | U2 = u}) 


u- 


By assuming that the copula is symmetric, we finally obtain: 


p 


Il 


2 lim Pr {Uz > u | U1 = u} 

u—=>1- 

= 2-2 lim Pr{U> < u| U =u} 
u>1— 

= 2-2 lim Cg) (u,u) (11.4) 
u>1— 


In a similar way, we find that the lower tail dependence of a symmetric copula is equal to: 


AT =2 lim Cop (u, u) (11.5) 
u—0+ 
For the copula functions C~ and C+, we have AT = At = 0. For the copula C+, we 


obtain AT = At = 1. However, there exist copulas such that A7 # A+. This is the case of the 
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1/0 
Gumbel copula C (u1, uz; 8) = exp (- k- In u)’ +(—ln uz)] ), because we have \7 = 


0 and At = 2 — 21/°, The Gumbel copula has then an upper tail dependence, but no lower 
—1/6 


, we 


tail dependence. If we consider the Clayton copula C (u1, ug; 0) = (ag? +uz’ — 1) 
obtain A~ = 2-1/8 and At =0. 


Coles et al. (1999) define the quantile-quantile dependence function as follows: 
AT (a) = Pr {X2 > Fy" (a) | X1 > FT" (a)} 


It is the conditional probability that Xə is larger than the quantile Fz‘ (a) given that X; 
is larger than the quantile F;' (a). We have: 


At (a) = Pr{X > Fj" (a) | Xi > Fy! (a)} 
Pr {X> > F3! (a), Xı > 1 (a)} 
Pr {X1 > F;' (a)} 
= 1-Pr{X1 < FT (a)} — Pr {X2 < Fy" (a)} Ps 
1—Pr{X < F]  (a)} 
Pr {X> < Fy" (a), Xı < Fy‘ (a)} 
1— Pr {F; (X1) < a} 
1 — 2a +C (a, a) 
l-a 


The tail dependence AŤ is then the limit of the conditional probability AT (œa) when the 
confidence level a tends to 1. It is also the probability of one variable being extreme given 
that the other is extreme. Because A*t (a) is a probability, we verify that A* € [0,1]. If 
the probability is zero, the extremes are independent. If A is equal to 1, the extremes 
are perfectly dependent. To illustrate the measures? At (a) and A~ (a), we represent their 
values for the Gumbel and Clayton copulas in Figure 11.10. The parameters are calibrated 
with respect to Kendall’s tau. 


Remark 138 We consider two portfolios, whose losses correspond to the random variables 
Lı and Ly with probability distributions Fı and F2. The probability that the loss of the 
second portfolio is larger than its value-at-risk knowing that the value-at-risk of the first 
portfolio is exceeded is exactly equal to the quantile-quantile dependence measure At (a): 


At (a) = Pr{L > F7" (a) | Li > F7*(a)} 
= Pr{Lz > VaRa (Lə) | Lı > VaRa (L1)} 


11.3 Parametric copula functions 


In this section, we study the copula families, which are commonly used in risk man- 
agement. They are parametric copulas, which depend on a set of parameters. Statistical 
inference, in particular parameter estimation, is developed in the next section. 


5We have A~ (a) = Pr { X2 < F,' (a) | X1 < Fo! (a)} and lima=»0 àT (a) = A~. 
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Gumbel—Hougaard copula Clayton copula 


FIGURE 11.10: Quantile-quantile dependence measures A* (a) and A7 (a) 


11.3.1 Archimedean copulas 
11.3.1.1 Definition 
Genest and MacKay (1986b) define Archimedean copulas as follows: 


Gin) = { ~ (y (u1) + ọ (u2)) nies y (u2) < ọ (0) 


where y a C? is a function which satisfies (1) = 0, y’(u) < 0 and y” (u) > 0 for all 
u € [0,1]. y(u) is called the generator of the copula function. If y (0) = oo, the generator 
is said to be strict. Genest and MacKay (1986a) link the construction of Archimedean 
copulas to the independence of random variables. Indeed, by considering the multiplicative 
generator A(u) = exp (—ọ (u)), the authors show that: 


C (u1, u2) = ATF (A (u1) A (ua) 
This means that: 
à (Pr{U; < u1, U2 < ug}) = A (Pr {Uy < u1}) x A (Pr {U2 < ug }) 


In this case, the random variables (U;, U2) become independent when the scale of probabil- 
ities has been transformed. 
Example 118 Jf y(u) =u! —1, we have y~!(u) = (1 +u)? and: 


C (ui, u2) = (1+ (up! —1+uz* — D un m uuz 


The Gumbel logistic copula is then an Archimedean copula. 
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Example 119 The product copula C+ is Archimedean and the associated generator is 
p(u) = —Inu. Concerning Fréchet copulas, only C~ is Archimedean with p(u) = 1 — u. 


In Table 11.1, we provide another examples of Archimedean copulas®. 


TABLE 11.1: Archimedean copula functions 


Copula ọ (u) C (u1, u2) 
C= — lnu uU1u2 

—0 -~6 =) —1/0 
Clayton u`? -—1 (ui +u’ -1 

et — 1 (e79 — 1) (e 92 — 1) 
Frank In = ae g" 1+ ai 
Gumbel (—Inu)? exp (- (a? + a) w 
Joe -In (1- a-u’) 1 — (a8 + a8 — apat) 


11.3.1.2 Properties 


Archimedean copulas play an important role in statistics, because they present many 
interesting properties, for example: 


e C is symmetric, meaning that C (u1, u2) = C (u2, u1); 

e C is associative, implying that C (ui, C (u1, u3)) = C (C (u1, u2) , us) ; 

e the diagonal section ô (u) = C (u, u) satisfies ô (u) < u for all u € (0,1); 

e if a copula C is associative and 6 (u) < u for all u € (0,1), then C is Archimedean. 
Genest and MacKay (1986a) also showed that the expression of Kendall’s tau is: 


riyaita f oO du 


Ww 


whereas the copula density is: 
p” (C (ur, ua)) p' (ur) p’ (ua) 
[e’ (C (ur, u2))]? 


Example 120 With the Clayton copula, we have y(u) = u~® — 1 and yg! (u) = —du~®-?. 
We deduce that: 


c(u1, U2) = 


1 -0 
l-u 
T T f Ju I1 u 
0 


0+2 


6We use the notations ŭūŭ = 1 — u and & = — ln u. 
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11.3.1.3  Two-parameter Archimedean copulas 


Nelsen (2006) showed that if ọ (t) is a strict generator, then we can build two-parameter 
Archimedean copulas by considering the following generator: 


Pap (t) = ED? 


where a > 0 and 8 > 1. For instance, if y(t) = tt — 1, the two-parameter generator is 
Yap (H= te 1)f . Therefore, the corresponding copula function is defined by: 


1/8 Sia 
C (u1, u2) = (CEEI +1) 
This is a generalization of the Clayton copula, which is obtained when the parameter ĝ is 
equal to 1. 
11.3.1.4 Extension to the multivariate case 
We can build multivariate Archimedean copulas in the following way: 
C(u1,..-,Un) = 9" (p (u1) +... +P (un) 


However, C is a copula function if and only if the function y~! (u) is completely monotone 
(Nelsen, 2006): 


që 
(-1)* g” (u) >0 Vk>1 


For instance, the multivariate Gumbel copula is defined by: 


C (u1, ..., Un) = exp (- (C mu)’ poni (mun) ) 


The previous construction is related to an important class of multivariate distributions, 
which are called frailty models (Oakes, 1989). Let F,,...,F, be univariate distribution 
functions, and let G be an n-variate distribution function with univariate marginals G;, 
such that G(0,...,0) = 1. We denote by y; the Laplace transform of G;. Marshall and 
Olkin (1988) showed that the function defined by: 


Peigi = f+ fC (HY (21), (@n)) dG (t1,...,tn) 


is a multivariate probability distribution with marginals F,,...,F, if H;(#) = 
exp (~y; ' (F; (z))). If we assume that the univariate distributions G; are the same and 
equal to G,, G is the upper Fréchet bound and C is the product copula C+, the previous 
expression becomes: 


E (21,..-,2n) = fie (xi) dG, (tı) 


i=l 


[ex (- ee (F; ) dG, (t4) 


= Y (Y (Fi (21)) +... +Y (Fr (2n))) 


The corresponding copula is then given by: 


C (u1,..., Un) = yp (ypt (u) +... +47! (un)) 


II 
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This is a special case of Archimedean copulas where the generator y is the inverse of 
a Laplace transform. For instance, the Clayton copula is a frailty copula where w(x) = 
(1+ a)! ° is the Laplace transform of a Gamma random variable. The Gumbel-Hougaard 
copula is frailty too and we have w(x) = exp (=x 2); This is the Laplace transform of a 
positive stable distribution. 


For frailty copulas, Joe (1997) showed that upper and lower tail dependence measures 
are given by: 


y' (2x) 
AT =2-—2 lim FE 
and: W (22) 
x 
Be aE: 


Example 121 In the case of the Clayton copula, the Laplace transform is w(a) = 
ata’. We have: 
W (22) (+22) 


w(t) (+g)! 


We deduce that: 


At = 2-—2lim 
x20 ag 
= 2-2 
= 0 
and: 
1/0-1 
jess, Hae RPE) / 
LOO Gta) ea 
= 2x2710-1 
971/0 


11.3.2 Normal copula 


The Normal copula is the dependence function of the multivariate normal distribution 
with a correlation matrix p: 


C (u1,.--,Unj P) = On (7 (u1),...,07! (un); p) 


By using the canonical decomposition of the multivariate density function: 


n 


Í (£1, ---, 8n) = c (F1 (21), Py ey) [Fe 


a 


we deduce that the probability density function of the Normal copula is: 


1 1 
c (ur, Un, ; P) = —z exp (-32" (1 — In) z) 
lel? 


736 Handbook of Financial Risk Management 


where x; = ®~! (u;). In the bivariate case, we obtain’: 


1 L? +a%—-22pryx. T? +r? 
c (u1, u2; p) = 1— p2 exp ( 2(1— p?) oF 2 


It follows that the expression of the bivariate Normal copula function is also equal to: 


(u1) p~ (u2) 
C (u1, u2; p) =| J 2 (£1, £2; p) dx; dz2 (11.6) 


where @¢9 (x1, £2; p) is the bivariate normal density: 


x? + a3 — ema) 


1 
z1, £2; p) = ex 
Q2 (£1, £2; P) n/a »( 2 (1 — p?) 


Example 122 Let (Xı, X2) be a standardized Gaussian random vector, whose cross- 
correlation is p. Using the Cholesky decomposition, we write Xa as follows: 


Xo = pXi + / 1 — p?X3 
where X3 ~ N (0,1) is independent from Xı and Xa. We have: 


®2(%1,%2;p) = Pr{X; < z1, Xə < x2} 


= l [Pr {Xi < 21,0X1 + V1 =X; < 22 | X1} 


= É a ad x) dx 
= [19 (Fe&) eos 


It follows that: 


®~1(u1) E i u2) — px 
C (u1, u2; p) =| ® (=e) (x) da 


—co 


We finally obtain that the bivariate Normal copula function is equal to: 


oy fg | 8" (ue) — pT) u 
Cuo) = | o rer Ja (11.7) 


This expression is more convenient to use than Equation (11.6). 


Like the normal distribution, the Normal copula is easy to manipulate for computational 
purposes. For instance, Kendall ’s tau and Spearman’s rho are equal to: 


T = — arcsin p 
T 


and: 


6 
o = — arcsin r 
T 2 


TĪn the bivariate case, the parameter p is the cross-correlation between X1 and Xo, that is the element 
(1,2) of the correlation matrix. 
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The conditional distribution C2); (u1, u2) has the following expression: 


Coj (ui,u2) = OC (u1, u2) 
i (= (uz) — pe! ) 


\/1 — p? 


To compute the tail dependence, we apply Equation (11.4) and obtain: 


wama Moe w 
u—1= {= p? 
= 2-2 lim o (VEe )) 
u>1- V1 + pP 


We finally deduce that: 
iaia ifp<1 


In Figure 11.11, we have represented the quantile-quantile dependence measure \* (a) for 
several values of the parameter p. When p is equal to 90% and a is close to one, we notice 
that AT (a) dramatically decreases. This means that even if the correlation is high, the 
extremes are independent. 


0.8 


0.6 


0.4 


0.2 


0.0 i 
0.0 0.2 0.4 0.6 0.8 1.0 


FIGURE 11.11: Tail dependence \* (a) for the Normal copula 


11.3.3 Student’s t copula 


In a similar way, the Student’s t copula is the dependence function associated with the 
multivariate Student’s t probability distribution: 


C(u1,..-,Unj p, V) = Th (To lu) E (un); pv) 
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By using the definition of the cumulative distribution function: 


ai = 
T a) ue i 1 T -1 j 
n (Tiesta pP) ye i+ oe Pr dx 


we can show that the copula density is then: 


c (u1 Uns; PV) = |p|? Pee) Ga ea 
y aie | 199 3 ae v n v vt 
BE) PG) Il, (1+ z) 2 


where x; = T7! (u;). In the bivariate case, we deduce that the t copula has the following 


expression: 
| (ur) > (u2) 
C (u1, ua; p, -= f” a 
Pee InVJ1— pP 


— OD = 
(1 + zi T a ty E dzı dx 


Like the Normal copula, we can obtain another expression, which is easier to manipulate. Let 
(X1, X2) be a random vector whose probability distribution is To (£1, £2; p, V). Conditionally 
to Xı = z1, we have: 


v+1 1/2 y% — px 
( ) = e ee 


v+at \/1— p? 


The conditional distribution C2); (u1, u2) is then equal to: 


v+1 oo 
u)|” 


v+ [T7 ( vi- 


Coj (u1, u2; p, V) = Tr41 ( 


We deduce that: i 
C (ur, ua; p, V) -| Coj (u, u2; p, V) du 
0 


We can show that the expression of Kendall’s tau for the t copula is the one obtained 
for the Normal copula. In the case of Spearman’s rho, there is no analytical expression. 
We denote by o(p, v) and on (p) the values of Spearman’s rho for Student’s t and Normal 
copulas with same parameter p. We can show that o;(p,v) > Qn (p) for negative values of 
p and o(p, V) < on (p) for positive values of p. In Figure 11.12, we report the relationship 
between 7 and o for different degrees of freedom v. 


Because the ¢ copula is symmetric, we can apply Equation (11.4) and obtain: 


JF 


1/2 e =. 
ne ( Ai ) a Ll) 
u=>1- v+ [T7 (u)] Vl—-p 


E w+- 
= 2-2 (CANA) 


We finally deduce that: 


~ \ >0 ifp>-tl 


Copulas and Dependence Modeling 739 
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FIGURE 11.12: Relationship between 7 and @ of the Student’s t copula 


Contrary to the Normal copula, the t copula has an upper tail dependence. In Figures 11.13 
and 11.14, we represent the quantile-quantile dependence measure At (a) for two degrees 
of freedom v. We observe that the behavior of \* (a) is different than the one obtained 
in Figure 11.11 with the Normal copula. In Table 11.2, we give the numerical values of 
the coefficient At for various values of p and v. We notice that it is strictly positive for 
small degrees of freedom even if the parameter p is negative. For instance, \* is equal to 
13.40% when v and p are equal to 1 and —50%. We also observe that the convergence to 
the Gaussian case is low when the parameter p is positive. 


TABLE 11.2: Values in % of the upper tail dependence At for the Student’s t copula 


P Parameter p (in %) 
—70.00 —50.00 0.00 50.00 70.00 90.00 
1 7.80 13.40 29.29 50.00 61.27 77.64 
2 2.59 5.77 18.17 39.10 51.95 71.77 
3 0.89 2.57 11.61 31.25 44.81 67.02 
4 0.31 1.17 7.56 25.32 39.07 62.98 
6 0.04 0.25 3.31 17.05 30.31 56.30 
10 0.00 0.01 0.69 8.19 19.11 46.27 
oo 0.00 0.00 0.00 0.00 0.00 0.00 


Remark 139 The Normal copula is a particular case of the Student’s t copula when v tends 
to co. This is why these two copulas are often compared for a given value of p. However, we 
must be careful because the previous analysis of the tail dependence has shown that these two 
copulas are very different. Let us consider the bivariate case. We can write the Student’s t 
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FIGURE 11.14: Tail dependence A* (a) for the Student’s t copula (v = 4) 
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random vector (Tı, T2) as follows: 


(To To) = (Mi, No) 
X/v 
Ny M- aa N 
eo X/ p X/v 


where Nı and Nz are two independent Gaussian random variables and X is a random 
variable, whose probability distribution is x? (v). This is the introduction of the random 
variable X that produces a strong dependence between T; and Tz, and correlates the extremes. 
Even if the parameter p is equal to zero, we obtain: 


This implies that the product copula C+ can never be attained by the t copula. 


11.4 Statistical inference and estimation of copula functions 


We now consider the estimation problem of copula functions. We first introduce the 
empirical copula, which may viewed as a non-parametric estimator of the copula function. 
Then, we discuss the method of moments to estimate the parameters of copula functions. 
Finally, we apply the method of maximum likelihood and show the different forms of im- 
plementation. 


11.4.1 The empirical copula 


Let F be the empirical distribution associated to a sample of T observations of the 
random vector (X1,...,Xp). Following Deheuvels (1979), any copula C € C defined on the 


lattice £: 
tı tn $ 
£= POOT Sy eet) E 


a fti t TAE 
ê (Beo) =p [HRs 


t=1 i=1 


by the function: 


is an empirical copula. Here R; ; is the rank statistic of the random variable X; meaning 
that Xm, iT, = Xt We notice that Ĉ is the copula function associated to the empirical 
distribution F. However, Ĉ is not unique because Ê is not continuous. In the bivariate case, 
we obtain: 


T 
a {ti t 1 
(2.2) = FLL Mi SH Ria < t) 


t=1 
2 

ET X 1 {£1 < Th:T,1, 012 < LtaT,2} 
t=1 
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where {(%4,1,%1,2),t=1,...,7'} denotes the sample of (X1, X2). Nelsen (2006) defines the 
empirical copula frequency function as follows: 


aftr te) — aftr te) afti-l te 
en) = eler) ee a) 


aft bal). aael n=l 
O(n) eT T) 


T 


1 
= 7 X 1 {211 = E :T,1,; Ct,2 = Ttz:T,2} 
t=1 


We have then: 


We can interpret ¢ as the probability density function of the sample. 


Example 123 We consider the daily returns of European (EU) and American (US) MSCI 
equity indices from January 2006 to December 2015. In Figure 11.15, we represent the level 
lines of the empirical copula and compare them with the level lines of the Normal copula. 
For this copula function, the parameter p is estimated by the linear correlation between the 
daily returns of the two MSCI equity indices. We notice that the Normal copula does not 
exactly fit the empirical copula. 


0.8 F 


US Equity 
Sag 
[e2] 


2 
P 
T 


EU Equity 


FIGURE 11.15: Comparison of the empirical copula (solid line) and the Normal copula 
(dashed line) 
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Like the histogram of the empirical distribution function f, it is difficult to extract 
information from C or ĉ, because these functions are not smooth®. It is better to use a 
dependogram. This representation has been introduced by Deheuvels (1981), and consists 
in transforming the sample {(x£+,1, £t,2),t = 1,..., T} of the random vector (X1, X2) into a 
sample {(uz,1, Ut2),¢t=1,...,T} of uniform random variables (U1, U2) by considering the 
rank statistics: i 

Uti = prii 


0.8 
*, US Equity 


0.6 


0.4 


0.2 


0.0 
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EU Equity 


FIGURE 11.16: Dependogram of EU and US equity returns 


The dependogram is then the scatter plot between uz; and wz,2. For instance, Figure 11.16 
shows the dependogram of EU and US equity returns. We can compare this figure with 
the one obtained by assuming that equity returns are Gaussian. Indeed, Figure 11.17 shows 
the dependogram of a simulated bivariate Gaussian random vector when the correlation is 
equal to 57.8%, which is the estimated value between EU and US equity returns during the 
study period. 


11.4.2 The method of moments 


When it is applied to copulas, this method is different than the one presented in Chapter 
10. Indeed, it consists in estimating the parameters 0 of the copula function from the 
population version of concordance measures. For instance, if r = f+ (0) is the relationship 
between 0 and Kendall’s tau, the MM estimator is simply the inverse of this relationship: 


ô = f7 (®) 


8This is why they are generally coupled with approximation methods based on Bernstein polynomials 
(Sancetta and Satchell, 2004). 
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FIGURE 11.17: Dependogram of simulated Gaussian returns 


where 7 is the estimate of Kendall’s tau based on the sample’. For instance, in the case of 


the Gumbel copula, we obtain: 
1 


1-7 


ô= 


Remark 140 This approach is also valid for other concordance measures like Spearman’s 
rho. We have then: R 

ô= f,* (ô) 
where ô is the estimate!? of Spearman’s rho and fo is the theoretical relationship between 0 
and Spearman’s rho. 


Example 124 We consider the daily returns of 5 asset classes from January 2006 to De- 
cember 2015. These asset classes are represented by the European MSCI equity index, the 
American MSCI equity index, the Barclays sovereign bond index, the Barclays corporate 
investment grade bond index and the Bloomberg commodity index. In Table 11.3, we report 
the correlation matrix. In Tables 11.4 and 11.5, we assume that the dependence function is a 
Normal copula and give the matrix p of estimated parameters using the method of moments 
based on Kendall’s tau and Spearman’s rhorho. We notice that these two matrices are very 


close, but we also observe some important differences with the correlation matrix reported 
in Table 11.3. 


9We have: 
c-—d 


c+d 
where c and d are respectively the number of concordant and discordant pairs. 
10Tt is equal to the linear correlation between the rank statistics. 


f= 
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TABLE 11.3: Matrix of linear correlations Îi j 
EU Equity US Equity Sovereign Credit Commodity 
EU Equity 100.0 
US Equity 57.8 100.0 
Sovereign —34.0 —32.6 100.0 
Credit —15.1 —28.6 69.3 100.0 
Commodity 51.8 34.3 —22.3 —14.4 100.0 
TABLE 11.4: Matrix of parameters /;,; estimated using Kendall’s tau 
EU Equity US Equity Sovereign Credit Commodity 
EU Equity 100.0 
US Equity 57.7 100.0 
Sovereign —31.8 —32.1 100.0 
Credit —17.6 —33.8 73.9 100.0 
Commodity 43.4 30.3 —19.6 —15.2 100.0 
TABLE 11.5: Matrix of parameters /;,; estimated using Spearman’s rho 
EU Equity US Equity Sovereign Credit Commodity 
EU Equity 100.0 
US Equity 55.4 100.0 
Sovereign —31.0 —31.3 100.0 
Credit —17.1 —32.7 73.0 100.0 
Commodity 42.4 29.4 —19.2 —14.9 100.0 
11.4.3 The method of maximum likelihood 
Let us denote by {(t¢1,..-,2tn),t=1...,T} the sample of the random vector 


(isc 
position: 


F (21,.. 


. £n) = C (Fy (21561) ,.--, En (£n; On) 5 Ac) 


This means that this statistical model depends on two types of parameters: 


e the parameters (01,... 


e the parameters ĝe of the copula function. 


The expression of the log-likelihood function is: 


L(A.. 


T 


t=1 


T on 
5 Xoin fi (Tri; 6;) 


t=1 i=1 


,9,) of univariate distribution functions; 


<; On, 0e) = So Inc (Fi (x4,13 01) En (Ltn; On) ; 0e) F 


, Xn), whose multivariate distribution function has the following canonical decom- 


where c is the copula density and f; is the probability density function associated to F;. 
The ML estimator is then defined as follows: 


(Bisz ban Be) = arg max € (01, ... , On, 0e) 
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The estimation by maximum likelihood method can be time-consuming when the num- 
ber of parameters is large. However, the copula approach suggests a two-stage parametric 
method (Shih and Louis, 1995): 


1. the first stage involves maximum likelihood from univariate marginals, meaning that 


we estimate the parameters 6,,...,0, separately for each marginal: 
T 
6; = arg max 5 ln fi (£i; 01) 
t=1 


2. the second stage involves maximum likelihood of the copula parameters 0e with the 
univariate parameters 01,...,0n held fixed from the first stage: 


T 
6. = argmax X nc (F (20.13 ô, ) OSE DA (Gis ôn) ide) 
t=1 


This approach is known as the method of inference functions for marginals or IFM (Joe, 
1997). Let rrm be the IFM estimator obtained with this two-stage procedure. We have: 


if ik (Gum = 2o) >N (0,v-? (90)) 


where Y (69) is the Godambe matrix (Joe, 1997). 


Genest et al. (1995) propose a third estimation method, which consists in estimating 
the copula parameters 0e by considering the non-parametric estimates of the marginals 
F,, Sie Fa 


T 
6.= arg max X` lnc (Ê (LE ites f, (Ltn) 3 8o) 
t=1 


In this case, f; (£t,i) is the normalized rank R,;/T. This estimator called omnibus or OM 
is then the ML estimate applied to the dependogram. 


Example 125 Let us assume that the dependence function of asset returns (X1, X2) is the 
Frank copula whereas the marginals are Gaussian. The log-likelihood function for observation 
t is then equal to: 


6. (1 —e-*) gestae) 2 
(1 — e7) — (1 = emeta) (1 E etma) ie 


1 

za) 
1 

mo + Zuto) 


where Yt i = oz ' (£ii — Hi) is the standardized return of asset i for the observation t. The 
vector of parameters to estimate is 0 = (u1, 01, H2, 02, 0c). In the case of the IFM approach, 
the parameters (H1, 01, 42,02) are estimated in a first step. Then, we estimate the copula 
parameter 0. by considering the following log-likelihood function: 


4 = In (6. (1 _ e?e) gPa = 


In (a = e %) = (1 = eer) (1 _ en Fe 20a) 


2 
Ino; 
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where jj; is equal to a (xti — fii). Finally, the OM approach uses the uniform variates 
uti = Ri i/T in the expression of the log-likelihood function function: 


i = h(k etje — 


In ((1 — 7) — (1 — e781) (1 — eee)? 


Using the returns of MSCI Europe and US indices for the last 10 years, we obtain the 
following results for the parameter 0e of the Frank copula: 


; Method of Moments 
! Kendall Spearman 


6.809 6.184 4.1491 3.982 3.721 
| 

0.554 0.524 0.3991 0.387 0.367 

0.754 0.721 0.571 | 0.555 0.529 


ML IFM OM 


We obtain 6. = 6.809 for the method of maximum likelihood and 6. = 6.184 for the IFM 
approach. These results are very close, that is not the case with the omnibus approach 
where we obtain Ê, = 4.149. This means that the assumption of Gaussian marginals is far 
to be verified. The specification of wrong marginals in ML and IFM approaches induces 
then a bias in the estimation of the copula parameter. With the omnibus approach, we do 
not face this issue because we consider non-parametric marginals. This explains that we 
obtain a value, which is close to the MM estimates (Kendall’s tau and Spearman’s rho). 


For IFM and OM approaches, we can obtain a semi-analytical expression of 6, for some 
specific copula functions. In the case of the Normal copula, the matrix p of the parameters 
is estimated with the following algorithm: 


1. we first transform the uniform variates u;,; into Gaussian variates: 
Nei = D7} (uti) 
2. we then calculate the correlation matrix of the Gaussian variates n; j. 
For the Student’s t copula, Bouyé et al. (2000) suggest the following algorithm: 


1. let fo be the estimated value of p for the Normal copula; 


2. Ĥk+1ı is obtained using the following equation: 


1 3 (v + Nn) asl 


WT TE Ed a 
where: 
tz? (ue) 
& = : 
ty? (utn) 


3. repeat the second step until convergence: Îk+1 = Pk := foo- 


Let us consider Example 124. We have estimated the parameter matrix p of Normal 
and Student’s t copulas using the omnibus approach. Results are given in Tables 11.6, 11.7 
and 11.8. We notice that these matrices are different than the correlation matrix calculated 
in Table 11.3. The reason is that we have previously assumed that the marginals were 
Gaussian. In this case, the ML estimate introduced a bias in the copula parameter in order 
to compensate the bias induced by the wrong specification of the marginals. 
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TABLE 11.6: Omnibus estimate f (Normal copula) 


EU Equity US Equity Sovereign Credit Commodity 


EU Equity 100.0 

US Equity 56.4 100.0 

Sovereign —32.5 —32.1 100.0 

Credit —16.3 —30.3 70.2 100.0 
Commodity 46.5 30.7 —21.1 —14.7 100.0 


TABLE 11.7: Omnibus estimate  (Student’s t copula with v = 1) 


EU Equity US Equity Sovereign Credit Commodity 


EU Equity 100.0 

US Equity 47.1 100.0 

Sovereign —20.3 —18.9 100.0 

Credit —9.3 —22.1 57.6 100.0 
Commodity 28.0 17.1 —7.4 —6.2 100.0 


TABLE 11.8: Omnibus estimate  (Student’s t copula with v = 4) 


EU Equity US Equity Sovereign Credit Commodity 


EU Equity 100.0 

US Equity 59.6 100.0 

Sovereign —31.5 —31.9 100.0 

Credit —18.3 —32.9 71.3 100.0 
Commodity 43.0 30.5 —17.2 —13.4 100.0 


Remark 141 The discrepancy between the ML or IFM estimate and the OM estimate is 
interesting information for knowing if the specification of the marginals are right or not. 
In particular, a large discrepancy indicates that the estimated marginals are far from the 
empirical marginals. 


11.5 Exercises 
11.5.1 Gumbel logistic copula 
1. Calculate the density of the Gumbel logistic copula. 


2. Show that it has a lower tail dependence, but no upper tail dependence. 


11.5.2 Farlie-Gumbel-Morgenstern copula 


We consider the following function: 


C (u1, u2) = urug (1+0(1-— u) (1 — u2)) (11.8) 
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1. Show that C is a copula function for 0 € {[—1, 1]. 


2. Calculate the tail dependence coefficient A, the Kendall’s 7 statistic and the Spear- 
man’s ọ statistic. 


3. Let X = (X1, X2) be a bivariate random vector. We assume that X; ~ N (u,07) 
and X2 ~ €(A). Propose an algorithm to simulate (X1, X2) when the copula is the 
function (11.8). 


4. Calculate the log-likelihood function of the sample {leri a. 


11.5.3 Survival copula 


Let S be the bivariate function defined by: 


z1x 
3 (21:02) = exp ( (ex H vo pe) ) 


with 0 € [0,1], xı > 0 et ro > 0. 


1. Verify that S is a survival function. 


2. Define the survival copula associated to S. 


11.5.4 Method of moments 


Let (X1, X2) be a bivariate random vector such that X, ~ N (u1, 07) and X> ~ 
N (u2, o2). We consider that the dependence function is given by the following copula: 


C (u1, u2) = 0 - C7 (uy, u2) + (1 — 0) - C? (ur, u2) 
where 0 € [0,1] is the copula parameter. 


1. We assume that 1 = u2 = 0 and g1 = o2 = 1. Find the parameter 0 such that the 
linear correlation of X; and Xə is equal to zero. Show that there exists a function f 
such that X; = f (X2). Comment on this result. 


2. Calculate the linear correlation of X and Xə as a function of the parameters H1, H2, 
01, 02 and @. 


3. Propose a method of moments to estimate 0. 


11.5.5 Correlated loss given default rates 


We assume that the probability distribution of the (annual) loss given default rate 
associated to a risk class C is given by: 


F(x) = Pr{LGD < x} 


= xI 


1. Find the conditions on the parameter y that are necessary for F to be a probability 
distribution. 


2. Let {£1,..., £n} be a sample of loss given default rates. Calculate the log-likelihood 
function and deduce the ML estimator mL. 
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3. Calculate the first moment E[LGD]. Then find the method of moments estimator 
sive 


4. We assume that x; = 50% for all i. Calculate the numerical values taken by mı and 
mM. Comment on these results. 


5. We now consider two risk classes Cı and Cz and note LGD, and LGD» the correspond- 
ing LGD rates. We assume that the dependence function between LGD; and LGD» 
is given by the Gumbel-Barnett copula: 


C (u1, U2) ea uzugeT? et In u2 


where @ is the copula parameter. Show that the density function of the copula is equal 
to: 
c (u1, U2; 0) = (1 — 0 — O In (u1u2) + 6? In u1 In ug) e79 m% mue 


6. Deduce the log-likelihood function of the historical sample {læ wak 


7. We note 41, 42 and 6 the ML estimators of the parameters 7 (risk class C1), y2 (risk 
class C2) and 0 (copula parameter). Why the ML estimator 4; does not correspond to 
the ML estimator ĝm, except in the case 6 = 0? Illustrate with an example. 


11.5.6 Calculation of correlation bounds 


1. Give the mathematical definition of the copula functions C7, C+ and C+. What is 
the probabilistic interpretation of these copulas? 


2. We note T and LGD the default time and the loss given default of a counterparty. We 
assume that T ~ E (A) and LGD ~ Uo). 


(a) Show that the dependence between 7 and LGD is maximum when the following 
equality holds: 
LGD +e -1=0 


(b) Show that the linear correlation p(T, LGD) verifies the following inequality: 
le (T, LGD)| < 


(c) Comment on these results. 
3. We consider two exponential default times 7, and T2 with parameters A; and A2. 


(a) We assume that the dependence function between 7, and Tz is Ct. Demonstrate 
that the following relationship is true: 


(b) Show that there exists a function f such that T2 = f (T2) when the dependence 
function is C7. 
(c) Show that the lower and upper bounds of the linear correlation satisfy the fol- 
lowing relationship: 
-l< P (Ti, T2) < 1 


Copulas and Dependence Modeling 751 


(d) In the more general case, show that the linear correlation of a random vector 
(X1, X2) cannot be equal to —1 if the support of the random variables X; and 
Xo is [0, +00]. 


4. We assume that (X1, X2) is a Gaussian random vector where Xı ~ N (m,0?), 
Xo ~ N (u2, 03) and p is the linear correlation between Xı and Xz. We note 
0 = (u1, 01, H2, 02, p) the set of parameters. 


(a) Find the probability distribution of Xı + X2. 


(b) Then show that the covariance between Yı = e*! and Y> = e*? is equal to: 
cov (Y1, Y2) = elitger , el2tze. . (c?7172 — 1) 


Deduce the correlation between Yı and Y>. 


— a 
a0 
VN Nai 


For which values of 6 does the equality p(¥1,Y2) = +1 hold? Same question 
when p (Yi, Yo) = —1. 


We consider the bivariate Black-Scholes model: 


— 
o) 
Ww 


{ dS} (t) = L194 (t) dt =e ois (t) dw, (t) 
dS5 (t) = H282 (t) dt + T282 (t) dW. (t) 


with E[W, (t) W2(t)] = pt. Deduce the linear correlation between Sı (t) and 
Sə (t). Find the limit case lim; 5. p (S1 (t) , S2 (t)). 


Comment on these results. 


(£ 


Ww 


11.5.7 The bivariate Pareto copula 


We consider the bivariate Pareto distribution: 


Otra ° bot ro\ | 
F (z1, £2) = 1 (= :) (= 2) l 
0i 4+ 24 02 + x2 1 =< 
a | b 


where xı > 0, x2 > 0, 0; > 0, 62 > 0 and a > 0. 


1. Show that the marginal functions of F (21, £2) correspond to univariate Pareto distri- 
butions. 


2. Find the copula function associated to the bivariate Pareto distribution. 
3. Deduce the copula density function. 


4. Show that the bivariate Pareto copula function has no lower tail dependence, but an 
upper tail dependence. 


5. Do you think that the bivariate Pareto copula family can reach the copula functions 
C~, C+ and Ct? Justify your answer. 


6. Let Xı and Xə be two Pareto distributed random variables, whose parameters are 
(a4, 61) and (a2, 62). 
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(a) Show that the linear correlation between X; and X3 is equal to 1 if and only if 
the parameters a, and ag are equal. 

(b) Show that the linear correlation between X; and X2 can never reached the lower 
bound —1. 

(c) Build a new bivariate Pareto distribution by assuming that the marginal distri- 
butions are P (a1, 01) and P (a2, 62) and the dependence function is a bivariate 
Pareto copula with parameter a. What is the relevance of this approach for 
building bivariate Pareto distributions? 


Chapter 12 


Extreme Value Theory 


This chapter is dedicated to tail (or extreme) risk modeling. Tail risk recovers two notions. 
The first one is related to rare events, meaning that a severe loss may occur with a very small 
probability. The second one concerns the magnitude of a loss that is difficult to reconciliate 
with the observed volatility of the portfolio. Of course, the two notions are connected, but 
the second is more frequent. For instance, stock market crashes are numerous since the end 
of the eighties. The study of these rare or abnormal events needs an appropriate framework 
to analyze their risk. This is the subject of this chapter. In a first section, we consider order 
statistics, which are very useful to understand the underlying concept of tail risk. Then, we 
present the extreme value theory (EVT) in the unidimensional case. Finally, the last section 
deals with the correlation issue between extreme risks. 


12.1 Order statistics 
12.1.1 Main properties 

Let X1,..., Xn be iid random variables, whose probability distribution is denoted by 
F. We rank these random variables by increasing order: 


Xin < Xan < cae < Ancin < Ann 


Xin is called the i* order statistic in the sample of size n. We note xj:n the corresponding 
random variate or the value taken by X;.,. We have: 
= Pr {at least i variables among X1,..., Xn are less or equal to z} 


= DD Pr {k variables among X4,..., Xn are less or equal to x} 
k=i 


=5 Groa — F (x) (12.1) 
k=i 

We note f the density function of F. We deduce that the density function of X;:, has the 
following expression: 


k=i 


D (PEO 0-91- PE fa) (12.2) 


793 
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It follows that: 


n 


Pw -EE F@) - 
k=i 


n-1 


D tanec (x) (1—F(2))""*"" f (a) 
k=i ` p 


= cae (x) (L= F(a)" f (2) - 
k=i ` 4 


n 


E goaa O- E e) fe) 


! — k)! 
rari !(n—k)! 
mF (e) (1 F(a)" f (a) (12.3) 
= — _ -F(z f 
G—-Din-v I r 
Remark 142 When k is equal to n, the derivative of (1 — F (x) is equal to zero. This 
explains that the second summation in Equation (12.2) does not include the case k = n. 
Example 126 If Xı,..., Xn follow a uniform distribution Ujo 1j, we obtain: 


n 


5 @ rE (=a) 


k=i 
= TB(x;in—i+1) 


where TB (x; a, 8) is the regularized incomplete beta function’: 
1 1 -i B-1 

—— | t (1 -t) dt 

B (a, B) 0 

We deduce that Xj. ~ B (i n — i + 1). It follows that the expected value of the order statistic 

Xi:n is equal to: 


TB (x; a, 8) = 


[Xin] = E[B(i,n-é+1) 


i 
n+1 


We verify the stochastic ordering: 
j > a > Fin a Pin 


Indeed, we have: 


Hee = Droa- ro 


= Fyn(x)+ > (A) (x)* (1-F (x))"~* 
k=i 


meaning that Fj, (x) > Fj (x). In Figure 12.1, we illustrate this property when the 
random variables X,,..., Xn follow the normal distribution M (0,1). We verify that F;:n (x) 
increases with the ordering value i. 


It is also the Beta probability distribution ZB (a; a, 8) = Pr {B (a, B) < a}. 
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| 


0.8 


EET 
OARWN 


| 


0.6 


Density function 
0.4 


0.2 


FIGURE 12.1: Distribution function F;.,, when the random variables X1,..., Xn are 
Gaussian 


12.1.2 Extreme order statistics 


Two order statistics are particularly interesting for the study of rare events. They are 
the lowest and highest order statistics: 


Xin = min (Xq,. fig dy) 


and: 
Xnin = max (X1, eae , Xn) 


We can find their probability distributions by setting i = 1 and i = n in Formula (12.1). 
We can also retrieve their expression by noting that: 


Fin (x) = Pr {min (X1, ..., Xn) < x} 1 — Pr {min (X1,..., Xn) > x} 


= 1—Pr{X; > z, Xə > 1,..., Xn > x} 


= 1- [|[P{X& >z} 
{=l 


= 1- [[0-Pr{X% <} 


i=l 


= 1-(1-F(x))” 


and: 


Fron () = Pr {max(X1,...,Xn) Sa} = Pr{Xi<a,X2<a,...,Xn <a} 
i=1 


= F(x)” 
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We deduce that the density functions are equal to: 
fin (2) =n (1 -F (2))"™ f (2) 
and 
n—l1 
fain (£) = nF (x) f (x) 


Let us consider an example with the Gaussian distribution M (0,1). Figure 12.2 shows the 
evolution of the density function fn:n with respect to the sample size n. We verify the 
stochastic ordering: n > m => Fyn. > Emm- 


50 p 


n=1 
@--: n=10 


Distribution function 


FIGURE 12.2: Density function fy.» of the Gaussian random variable M (0,1) 


Let us now illustrate the impact of the probability distribution tails on order statistics. 
We consider the daily returns of the MSCI USA index from 1995 to 2015. We consider three 
hypotheses: 


Hı Daily returns are Gaussian, meaning that: 
Ry = Ê + GX 
where X, ~ N (0,1), fi is the empirical mean of daily returns and ô is the daily 
standard deviation. 


Hə Daily returns follow a Student’s t distribution?: 


y—2 
V 


Ri = ì+ô Xt 


where X; ~ t,. We consider two alternative assumptions: Hoa : v = 3 and Hæ : v = 6. 


=2 
?We add the factor 4/ 2 in order to verify that var (R+) = 6?. 
v 
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Return (in 7) 


Return (in 7) 


757 


FIGURE 12.3: Density function of the maximum order statistic (daily return of the MSCI 


USA index, 1995-2015) 


We represent the probability density function of Rn:n for several values of n in Figure 
12.3. When n is equal to one trading day, Rn:n is exactly the daily return. We notice that 
it is difficult to observe the impact of the probability distribution tail. However, when n 
increases, the impact becomes more and more important. Order statistics allow amplifying 
local phenomena of probability distributions. In particular, extreme order statistics are a 


very useful tool to analyze left and right tails. 


Remark 143 The limit distributions of minima and maxima are given by the following 


results: 
Jim Fin (z) = Jim 1- (1-—F (x))” 
= [O ifF(x) =0 
1 ifF(x)>0 
and: 
-E 
_ 0 ifF(x)<1 
= |1 F(x) =1 


We deduce that the limit distributions are degenerate as they only take values of 0 and 
1. This property is very important, because it means that we cannot study extreme events 
by considering these limit distributions. This is why the extreme value theory is based on 
another convergence approach of extreme order statistics. 
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12.1.3 Inference statistics 


The common approach to estimate the parameters @ of the probability density function 
f (a; 6) is to maximize the log-likelihood function of a given sample {x1,..., £r}: 


T 
6 = arg max In f (x43 0) 


t=1 


In a similar way, we can consider the sample? {2/,..., £h st of the order statistic X;-,, and 
estimate the parameters 0 by the method of maximum likelihood: 


Gin = arg max £Li:n (0) 
where: 


ns 


Lin (0) = XOM fin (240) 
s=1 


= Soin gt CO FON A e) 


The computation of the log-likelihood function gives: 


Lin (0) = nsglnn!—nsln(i—1)!—nsln (n — i)! + 
ns ns 


(i— 1) XC mF (z406) + (n— i) Xn (1 — F (a4; 0)) + 
Son f (2438) 


By definition, the traditional ML estimator is equal to new ML estimator when n = 1 and 
tl; 
6 = ĝi: 


In the other cases (n > 1), there is no reason that the two estimators coincide exactly: 


However, if the random variates are drawn from the distribution function X ~ F (zx; 0), we 
can test the hypothesis H : Biv = 6 for all n and i < n. If two estimates bien and Bir are 
very different, this indicates that the distribution function is certainly not appropriate for 
modeling the random variable X. 

Let us consider the previous example with the returns of the MSCI USA index. We 
assume that the daily returns can be modeled with the Student’s t distribution: 


Ri — pu 


on 


~ ty 


The vector of parameters to estimate is then 0 = (u,0). In Tables 12.1, 12.2 and 12.3, 
we report the values taken by the ML estimator 6;.,, obtained by considering several order 
statistics and three values of v. For instance, the ML estimate G;., in the case of the tı 
distribution is equal to 50 bps. We notice that the values taken by i:n are not very stable 


3The size of the sample ng is equal to the size of the original sample T divided by n. 
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TABLE 12.1: ML estimate of ø (in bps) for the probability distribution tı 


Order i 
1 2 3 4 5 6 7 8 9 10 


Size n 


32 49 55 56 55 45 29 

31 48 53 55 54 50 43 26 

29 46 55 56 57 55 49 40 25 
53 48 37 20 
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TABLE 12.2: ML estimate of ø (in bps) for the probability distribution te 


Order i 
1 2 3 4 5 6 7 8 9 10 


Size n 


102 116 103 89 85 89 98 89 
105 121 117 97 85 86 94 101 88 
120 108 91 87 92 99 104 88 
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TABLE 12.3: ML estimate of o (in bps) for the probability distribution tso 


Size n Order i 
1 2 3 4 5 6 7 8 9 10 
125 
125 124 
136 116 129 
147 116 112 140 
103 114 150 


163 142 118 107 122 157 

171 152 125 105 117 134 162 

175 165 130 106 99 111 139 170 

180 174 155 122 95 99 128 152 171 
162 136 110 100 111 127 155 181 
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with respect to i and n. This indicates that the three probability distribution functions 
(tı, te and tæ) are not well appropriate to represent the index returns. In Figure 12.4, we 
have reported the corresponding annualized volatility* calculated from the order statistics 
Ri:10. In the case of the tı distribution, we notice that it is lower for median order statistics 
than extreme order statistics. The tı distribution has then the property to overestimate 
extreme events. In the case of the Gaussian (or too) distribution, we obtain contrary results. 
The Gaussian distribution has the property to underestimate extreme events. In order to 
compensate this bias, the method of maximum likelihood applied to extreme order statistics 
will overestimate the volatility. 


30 


k---4 
N 


25 


20 


10 


FIGURE 12.4: Annualized volatility (in %) calculated from the order statistics R;;10 


Remark 144 The approach based on extreme order statistics to calculate the volatility is 
then a convenient way to reduce the under-estimation of the Gaussian value-at-risk. 


12.1.4 Extension to dependent random variables 
Let us now assume that X1,..., Xn are not iid. We note C the copula of the correspond- 
ing random vector. It follows that: 
Bren (z) = Pr{Xnin <x} 
= Pr{xX, <a,...,X, <T} 
C (Fi (x),..., En (x)) 


4The annualized volatility takes the value V260 - c- Gi:n where the constant c is equal to 4/v/ (v — 2). 
In the case of the tı distribution, c is equal to 3.2. 
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and: 


Fin (£) = Pr{Xin <r} 
= 1—Pr{Xin > x} 
= l-—Pr{X; > 2,..., Xn 2 x} 
= 1-Č(1-F;(x),...,1— F, (x)) 


where Č is the survival copula associated to C. 


Remark 145 In the case of the product copula and identical probability distributions, we 
retrieve the previous results: 


Pale) = CPG) pai, F @) 
= F(z)" 
and: 
Fi,(z) = 1-C+(-F(),...,1-—F(s)) 
l= Pi)" 


If we are interested in other order statistics, we use the following formula given in Georges 
et al. (2001): 


=> So) a) 5 C(u1,...,Un) 


k=i | l=i v(Fi(2),...,.Fn(x))EZ(n—k,n) 


>. 


where: 


Z(m,n)= fve fat u etu Druz =m) 


In order to understand this formula, we consider the case n = 3. We havež: 


Fı:3 (£) = Fi (x) + F(x) + F; (x) 
C (Fi (x), F2 (x) ,1) — C (Fi (x) , 1, F3 (x)) — C (1, F2 (x) , Fs (x) + 
C (Fi (x) , F2 (x) , Fs (x) 

F23 (£) = C (F; (x), F2(x),1) +C (Fi (x) ,1, F; (x)) + C (1, Fa (x) , Fs (x)) — 
2C (F; (x), Fo (x) , F; (x)) 

Fs:3 (£) = C (F; (x), F2(x), F; (x)) 


We verify that: 


F 1.3 (x) + F2.3 (x) + F3.3 (x) = F, (x) + Fo (x) + F; (x) 


The dependence structure has a big impact on the distribution of order statistics. For 
instance, if we assume that X1,..., Xn are iid, we obtain: 


Fn:n (£) = F (x)” 


5Because C (F; (x), 1,1) = F1 (2). 
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If the copula function is the upper Fréchet copula, this result becomes: 


Frin(z) = Ct (F(a),...,F(z)) 
= min(F(a),...,F(2)) 
F (x) 


This implies that the occurrence probability of extreme events is lower in this second case. 


We consider n Weibull default times 7; ~ W (Ai, yi). The survival function is equal 
to S; (t) = exp (—A,t”). The hazard rate À; (t) is then A;7;t7~! and the expression of the 
density is f; (t) = à; (t) S; (t). If we assume that the survival copula is the Gumbel-Hougaard 
copula with parameter 0 > 1, the survival function of the first-to-default time is equal to: 


Sin (t) 


oe (- ((-mn SQ) +... + (~ 1nSn )”)") 
exp (- ae xen) ") 


We deduce the expression of the density function: 


hd) = (Shae) (E nate). 
exp (- e xen) ) 


In the case where the default times are identically distributed, the first-to-default time is a 
Weibull default time: Tiin ~ W (nt/ Oh) In Figure 12.5, we report the density function 
fio (t) for the parameters \ = 3% and y = 2. We notice that the parameter 0 of the copula 
function has a big influence on the first-to-default time. The case 0 = 1 corresponds to the 
product copula and we retrieve the previous result: 


Sin. (t) = S (t)” 


When the Gumbel-Hougaard is the upper Fréchet copula (0 — oo), we verify that the 
density function of Ti:n is this of any default time 7;. 


12.2 Univariate extreme value theory 


The extreme value theory consists in studying the limit distribution of extreme order 
statistics X1:n and Xn:n when the sample size tends to infinity. We will see that the limit 
distribution converges to three probability distributions. This result will help to evaluate 
stress scenarios and to build a stress testing framework. 


Remark 146 In what follows, we only consider the largest order statistic Xn:n. Indeed, the 
minimum order statistic Xj.» can be defined with respect to the maximum order statistic 
Yuin by setting Yi = —X;: 

Xin = min (X1,..., Xn) 


= min(—Yj,...,—Yn) 
= —max(Yj,...,¥n) 
= —Ynin 
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0.5 p 


FIGURE 12.5: Density function of the first-to-default time T1:10 


12.2.1 Fisher-Tippet theorem 


We follow Embrechts et al. (1997) for the formulation of the Fisher-Tippet theorem. Let 
X1,...,Xn be a sequence of iid random variables, whose distribution function is F. If there 
exist two constants an and 6, and a non-degenerate distribution function G such that: 


lim prf Zenta < o} = G (zx) (12.4) 


n— o0 An 


then G can be classified as one of the following three types®: 


TypeI (Gumbel) A(x) = exp (-—e~*) 


Type II (Fréchet) ©®, (x)= 1 (x > 0)-exp(—a°) 


Type III (Weibull) W, (a) = 1 (x < 0)- exp (—(—2)*) 


The distribution functions A, ®, et Ya are called extreme value distributions. The Fisher- 
Tippet theorem is very important, because the set of extreme value distributions is very 
small although the set of distribution functions is very large. We can draw a parallel with 
the normal distribution and the sum of random variables. In some sense, the Fisher-Tippet 
theorem provides an extreme value analog of the central limit theorem. 


6In terms of probability density functions, we have: 
exp (-2 — e-*) (Gumbel) 


g(#)=4 1 («> 0)-ax-C+) . exp (—a-*) (Fréchet) 
A(x < 0)-a(—x)*~! - exp (— (—2)*) (Weibull) 
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Let us consider the case of exponential random variables, whose probability distribution 
is F (x) = 1 — exp(—Az). We have’: 


lim Fn:n (x)= lim (1 — aed 


n— co n—> o0 


We verify that the limit distribution is degenerate. If we consider the affine transformation 
with a, = 1/A et bn = (Inn) /À, we obtain: 


prf < o} = Pr {Xnin < An x + bn} 
an 
= (1 _ e Maer). 


= (1 _ e aan. 


We deduce that: 


Gaia (1- =) 


n— oo 
= exp (—e~”) 


It follows that the limit distribution of the affine transformation is not degenerate. In Figure 
12.6, we illustrate the convergence of F” (anz + bn) to the Gumbel distribution A (æ). 


Example 127 If we consider the Pareto distribution, we have: 


F(x) =1- (4) " 


The normalizing constants are an = r-n% and bn = 0. We obtain: 


= Iag 8\ 
an v_ 


We deduce that the law of the maximum tends to the Fréchet distribution: 


a = 
dim, (1-=*) = exp (—a ) 


7Because we have: 
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FIGURE 12.6: Max-convergence of the exponential distribution € (1) to the Gumbel dis- 
tribution 


Example 128 For the uniform distribution, the normalizing constants become ay, = n~! 


and bn = 1 and we obtain the Weibull distribution with a = 1: 


lim Pr f Senate <a} = (EES 


n00 a 


= exp(z) 


12.2.2 Maximum domain of attraction 


The application of the Fisher-Tippet theorem is limited because it can be extremely 
difficult to find the normalizing constants and the extreme value distribution for a given 
probability distribution F. However, the graphical representation of A, ®a and Pa given 
in Figure 12.7 already provides some information. For instance, the Weibull probability 
distribution concerns random variables that are right bounded. This is why it has less 
interest in finance than the Fréchet or Gumbel distribution functionsë. We also notice some 
differences in the shape of the curves. In particular, the Gumbel distribution is more ‘normal’ 
than the Fréchet distribution, whose shape and tail depend on the parameter a (see Figure 
12.8). 


We say that the distribution function F belongs to the max-domain of attraction of 
the distribution function G and we write F € MDA (G) if the distribution function of 
the normalized maximum converges to G. For instance, we have already seen that E (A) € 
MDA (A). In what follows, we indicate how to characterize the set MDA (G) and which 


normalizing constants are’. 


8However, the Weibull probability distribution is related to the Fréchet probability distribution thanks 
to the relationship Ya (x) = a (2-1). 
°Most of the following results come from Resnick (1987). 
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FIGURE 12.7: Density function of A, ®; and P; 


FIGURE 12.8: Density function of the Fréchet probability distribution 
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12.2.2.1 MDA of the Gumbel distribution 
F € MDA (A) if and only if there exists a function h (t) such that: 


1-F(t+a-h(t)) 
tam «1 — F} 


= exp (—z) 


where a < oo. The normalizing constants are then a, = h(F~'(1—n7')) and bp = 
F! (1 = no oy 
The previous characterization of MDA (A) is difficult to use because we have to define 
the function h (t). However, we can show that if the distribution function F is C°, a sufficient 
condition is: 22 
1- F -0 F 
m CEO) BF (a 
z>% (Oz F (x) 
For instance, in the case of the exponential distribution, we have F (x) = 1 — exp(—Az), 
Oz F (x) = Aexp (—Ax) and 02 F (x) = —\? exp (—Ax). We verify that: 


=-1 


pa (1 — F(2)) - 02 F(z) — jm P (—Azx) - (—A? exp (—Ax)) E 


£r— o0 (3; F (x))? £r— 00 (Aexp (—Ax))? 


If we consider the Gaussian distribution M (0,1), we have F(x) = ®(x), 0, F (x) = ¢(2) 
and 02 F (x) = —x¢ (x). Using L’Hospital’s rule, we deduce that: 
(1 —F (x)) - 0? F(z) x: ®(—2) 


= li = 
ats (0, F (x))? Manes o (x) 1 


12.2.2.2 MDA of the Fréchet distribution 


We say that a function f is regularly varying with index a and we write f € RVa if we 


have: 

lim Mea =g 

2 FO 
for every x > 0. We can then show the following theorem: F € MDA (®,) if and only if 
1— F € RV_a, and the normalizing constants are a, = F7! (1 — n7!) and bn = 0. 


Using the previous theorem, we deduce that the distribution function F € MDA (®,) if 
it satisfies the following condition: 


If we apply this result to the Pareto distribution, we obtain: 


EN 1—F(t- x) <a, ii (t- z/s) | 
t>o 1—F(t) too (tJe) | 


We deduce that 1—F € RV_g, F € MDA (®,q), an = F7! (1 — n7!) = x-n!/® and b, = 0. 
Remark 147 The previous theorem suggests that: 


1-F(t-z) 
cron 


—a 
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FIGURE 12.9: Graphical validation of the regular variation property for the normal dis- 
tribution M (0,1) 


when t is sufficiently large. This means that we must observe a linear relationship between 
ln (x) and ln (1 — F (t - x)): 


ln (1—F (t-x)) ~ ln (1—F (t))— aln (x) 


This property can be used to check graphically if a given distribution function belongs or not 
to the maximum domain of attraction of the Fréchet distribution. For instance, we observe 
that N (0,1) MDA (a) in Figure 12.9, because the curve is not a straight line. 


12.2.2.3 MDA of the Weibull distribution 

For the Weibull distribution, we can show that F € MDA(W®,) if and only if 1 — 
F (xo — £7!) € RV_, and zo < co. The normalizing constants are an = £o —F~! (1 — n7!) 
and bn = Xo. 


If we consider the uniform distribution with xp = 1, we have: 


1 

F —g')=1-— 

(v-27#) =1-1 

and: 
1-F(1-¢ te a aa 
lim ( ad ) = lim = 
t00 1—F(1-¢~') tooo tl 
=j 
= £ 


We deduce that F € MDA (Y1), an = 1 — F~! (1 — n7t) = n~! and b, = 1. 
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TABLE 12.4: Maximum domain of attraction and normalizing constants of some distri- 


bution functions 


Distribution G(x) an bn 
E(A) A At AtInn 
G (a, b) A Y Bt (Inn + (a — 1) In (lnn) — nT (a)) 
_ 4lnn — In 47 — In (Inn) 
N (0,1 A (2Inn)~/? 
7 So EET 
LN (u,0?) A o(2inn)~/?5 exp (u+o( nn—ln4r + amw) ) 


2V2lnn 


P (a, x_) Pa z n/a 0 
(n (Inn)*~') “i 
Lola)  & +a 0 
te 6, T! (1-n!) 
Uio,1] WV, n! j 1 
nT (a+p) \ 
Bae we (sree) 


Source: Embrechts et al. (1997). 


12.2.2.4 Main results 


In Table 12.4, we report the maximum domain of attraction and normalizing constants 
of some well-known distribution functions. 


Remark 148 Let G(x) be the non-degenerate distribution of Xn:n. We note an and bn 
the normalizing constants. We consider the linear transformation Y = cX + d with c > 0. 
Because we have Ynn = CXn:n + d, we deduce that: 


G(x) = lim Pr{Xnn < ane + bn} 


n— o0 


= lim Pr ———— < anz + bn 
c 
= lim Pr{Yn:n < ancz + bnc + d} 
noo 


= lim Pr{Yn:n < anx + b} 
n—- co 


where al, = anc and bl, = bnc + d. This means that G (x) is also the non-degenerate distri- 
bution of Yan:n, and al, and bl, are the normalizing constants. For instance, if we consider 
the distribution function N (u, o°), we deduce that the normalizing constants are: 


an = o (2 ln n)? 
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and: 


wee 
2V2Inn 


The normalizing constants are uniquely defined. In the case of the Gaussian distribution 
N (0,1), they are equal to an = h (bn) = bn/ (1+ 62) and bp = ®-1(1—n7'). In Table 
12.4, we report an approximation which is not necessarily unique. For instance, Gasull et 
al. (2015) propose the following alternative value of bn: 


2 p — 
bn ~ 4/ In Yim tek In (0.5 + Inn?) — 2 
27 2T lnn? — lIn 2r 


and show that this solution is more accurate than the classical approximation. 


bn =p to ( 


12.2.3 Generalized extreme value distribution 
12.2.3.1 Definition 


From a statistical point of view, the previous results of the extreme value theory are 
difficult to use. Indeed, they are many issues concerning the choice of the distribution 
function, the normalizing constants or the convergence rate as explained by Coles (2001): 


“The three types of limits that arise in Theorem 12.2.1 have distinct forms of 
behavior, corresponding to the different forms of tail behaviour for the distribu- 
tion function F of the X;. This can be made precise by considering the behavior 
of the limit distribution G at x, its upper end-point. For the Weibull distribu- 
tion x, is finite, while for both the Fréchet and Gumbel distributions x4 = oo. 
However, the density of G decays exponentially for the Gumbel distribution and 
polynomially for the Fréchet distribution, corresponding to relatively different 
rates of decay in the tail of F. It follows that in applications the three different 
families give quite different representations of extreme value behavior. In early 
applications of extreme value theory, it was usual to adopt one of the three 
families, and then to estimate the relevant parameters of that distribution. But 
there are two weakness: first, a technique is required to choose which of the three 
families is most appropriate for the data at hand; second, once such a decision is 
made, subsequent inferences presume this choice to be correct, and do not allow 
for the uncertainty such a selection involves, even though this uncertainty may 
be substantial”. 


In practice, the statistical inference on extreme values takes another route. Indeed, the three 
types can be combined into a single distribution function: 


c) =e ( (i (59) 


defined on the support A = fx : 1+ 07! (x — u) > Oo}. It is known as the ‘generalized 
extreme value’ distribution and we denote it by GEV (u, 0, €). We obtain the following cases: 


e the limit case € — 0 corresponds to the Gumbel distribution; 


e €=~—a7! > 0 defines the Fréchet distribution; 


= 


e the Weibull distribution is obtained by considering € = —a™> < 0. 
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We also notice that the parameters u and ø are the limits of the normalizing constants bn 
and a,. The corresponding density function is equal to: 


g(x) = = (1+6 (S ew (- (1+e(4)) ") 


It is represented in Figure 12.10 for various values of the parameters. We notice that u is a 
parameter of localization, ø controls the standard deviation and € is related to the tail of 
the distribution. The parameters can be estimated using the method of maximum likelihood 
and we obtain: 


wwe (Gahe 


¢th 


where x; is the observed maximum for the period. 


We consider again the example of the MSCI USA index. Using daily returns, we cal- 
culate the block maximum for each period of 22 trading days. We then estimate the GEV 
distribution using the method of maximum likelihood. For the period 1995-2015, we obtain 
Ê = 0.0149, ¢ = 0.0062 and = 0.3736. In Figure 12.11, we compared the estimated GEV 
distribution with the distribution function F22:22 (x) when we assume that daily returns are 
Gaussian. We notice that the Gaussian hypothesis largely underestimates extreme events 
as illustrated by the quantile function in the table below: 


a 90% 95% 96% 97% 98% 99% 
Gaussian 3.26% 3.56% 3.65% 3.76% 3.92% 4.17% 
GEV 3.66% 4.84% 5.28% 5.91% 6.92% 9.03% 


For instance, the probability is 1% to observe a maximum daily return during a period of 
one month larger than 4.17% in the case of the Gaussian distribution and 9.03% in the case 
of the GEV distribution. 


12.2.3.2 Estimating the value-at-risk 


Let us consider a portfolio w, whose mark-to-market value is P; (w) at time t. We recall 
that the P&L between t and t + 1 is equal to: 


H(w) = Pri (w)—F(w) 
= P,(w)-R(w) 


where R(w) is the daily return of the portfolio. If we note f the estimated probability 
distribution of R (w), the expression of the value-at-risk at the confidence level œ is equal 
to: 

VaRq (w) = —P; (w)-F-! (1 — a) 


We now estimate the GEV distribution G of the maximum of —R(w) for a period of 
n trading days!°. We have to define the confidence level aggy when we consider block 
minima of daily returns that corresponds to the same confidence level a when we consider 
daily returns. For that, we assume that the two exception events have the same return 
period, implying that: 


x 1 day = x n days 


l-—a 1 — agEyv 


10We model the maximum of the opposite of daily returns, because we are interested in extreme losses, 
and not in extreme profits. 
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FIGURE 12.10: Probability density function of the GEV distribution 
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FIGURE 12.11: Probability density function of the maximum return Ro92.92 
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We deduce that: 
acrev =1—(l-a)-n 


It follows that the value-at-risk calculated with the GEV distribution is equal to!!: 


A 


VaRo (w) = P(t): G7! (acev) 


We consider four portfolios invested in the MSCI USA index and the MSCI EM index: 
(1) long on the MSCI USA, (2) long on the MSCI EM index, (3) long on the MSCI USA 
and short on the MSCI EM index and (4) long on the MSCI EM index and short on the 
MSCI USA index. Using daily returns from January 1995 to December 2015, we estimate 
the daily value-at-risk of these portfolios for different confidence levels a. We report the 
results in Table 12.5 for Gaussian and historical value-at-risk measures and compare them 
with those calculated with the GEV approach. In this case, we estimate the parameters of 
the extreme value distribution using block maxima of 22 trading days. When we consider a 
99% confidence level, the lowest value is obtained by the GEV method followed by Gaussian 
and historical methods. For a higher quantile, the GEV VaR is between the Gaussian VaR 
and the historical VaR. The value-at-risk calculated with the GEV approach can therefore 
be interpreted as a parametric value-at-risk, which is estimated using only tail events. 


TABLE 12.5: Comparing Gaussian, historical and GEV value-at-risk measures 


Long US Long EM 
bias @ Long Ue Done EM chow BM Short US 
99.0% 2.88% 2.83% 3.06% 3.03% 
Gaussian 99.5% 3.19% 3.14% 3.39% 3.36% 
99.9% 3.83% 3.77% 4.06% 4.03% 
~ 99.0% 3.46% 3.61% 3.37% 3.81% 
Historical 99.5% 4.66% 4.73% 3.99% 4.74% 
99.9% 7.74% 7.87% 6.45% 7.27% 
~ 99.0% 2.64% 2.61% 2.72% 2.938% — 
GEV 99.5% 3.48% 3.46% 3.41% 3.82% 
99.9% 5.91% 6.05% 5.35% 6.60% 


12.2.4 Peak over threshold 
12.2.4.1 Definition 


The estimation of the GEV distribution is a ‘block component-wise’ approach. This 
means that from a sample of random variates, we build a sample of maxima by considering 
blocks with the same length. This implies a loss of information, because some blocks may 
contain several extreme events whereas some other blocks may not be impacted by extremes. 
Another approach consists in using the ‘peak over threshold’ (POT) method. In this case, 
we are interested in estimating the distribution of exceedance over a certain threshold u: 


F(z) =Pr{xX —u<a«|xX >u} 


11 The inverse function of the probability distribution GEV (u, ø, £) is equal to: 


G1 (a) =p- : (1 -—(- Ina)~*) 
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where 0 < x < zo — u and x = sup {z € R: F(x) < 1}. Fu (x) is also called the conditional 
excess distribution function. It is also equal to: 


F(x) = 1—Pr{X-u<xr|X<u} 
E 1—F(u+z2) 
sai a] 

F(u + x) — F(u) 

1 — F(u) 


Pickands (1975) showed that, for very large u, F(x) follows a generalized Pareto distribu- 
tion (GPD): Fu (x) ~ H (x) where!?: 


Hi =1- (148) 


The distribution function GPD (c, £) depends on two parameters: o is the scale parameter 
and € is the shape parameter. 


Example 129 [fF is an exponential distribution E (A), we have: 


1—F(u+2) 


IF) = exp (—Az) 


This is the generalized Pareto distribution when o = 1/A and € > 0. 


Example 130 IfF is a uniform distribution, we have: 


1—F(u+ 7z) x 


1-F(u) l-u 


It corresponds to the generalized Pareto distribution with the following parameters: o = 1—u 
and E = —1. 


In fact, there is a strong link between the block maxima approach and the peak over 
threshold method. Suppose that Xn:n ~ GEV (u, 0, £). It follows that: 


F” (x) zaf- (16 S) 
nin F (z) ~ — (ite (=e) 


Using the approximation In F (x) ~ — (1 — F (2)) for large x, we obtain: 


ront ( (5) 


We find that F,(x) is a generalized Pareto distribution GPD (4, €): 


We deduce that: 


Pr{xX >u+a|X>u} = 


l21Tf € 0, we have H (x) = 1 — exp (—2/o). 
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where: 
õ=0+Ẹ(u-— u) 
Therefore, we have a duality between GEV and GPD distribution functions: 


“[...] if block maxima have approximating distribution G, then threshold 
excesses have a corresponding approximate distribution within the generalized 
Pareto family. Moreover, the parameters of the generalized Pareto distribution 
of threshold excesses are uniquely determined by those of the associated GEV 
distribution of block maxima. In particular, the parameter € is equal to that 
of the corresponding GEV distribution. Choosing a different, but still large, 
block size n would affect the values of the GEV parameters, but not those 
of the corresponding generalized Pareto distribution of threshold excesses: € is 
invariant to block size, while the calculation of õ is unperturbed by the changes 
in u and o which are self-compensating” (Coles, 2001, page 75). 


The estimation of the parameters (ø, £) is not obvious because it depends on the value 
taken by the threshold u. It must be sufficiently large to apply the previous theorem, but 
we also need enough data to obtain good estimates. We notice that the mean residual life 
e(u) is a linear function of u: 


e(u) = E[X-u|xX>u 
o+ &u 
Ls 
when € < 1. If the GPD approximation is valid for a value uo, it is therefore valid for any 


value u > uo. To determine ug, we can use a mean residual life plot, which consists in 
plotting u against the empirical mean excess é (wu): 


aa (zi — u) 
Xia L {ai > u} 


Once uo is found, we estimate the parameters (o, €) by the method of maximum likelihood 


or the linear regression’. 


é(u) = 


Let us consider our previous example. In Figure 12.12, we have reported the mean 
residual life plot for the left tail of the four portfolios!*. The determination of ug consists 
in finding linear relationships. We have a first linear relationship between u = —3% and 
u = —1%, but it is not valid because it is followed by a change in slope. We prefer to 
consider that the linear relationship is valid for u > 2%. By assuming that uo = 2% for all 
the four portfolios, we obtain the estimates given in Table 12.6. 


12.2.4.2 Estimating the expected shortfall 


We recall that: 
F(u + 2) — F(u) 


F(x) = TFU) = H (2) 
where H ~ GPD (0, £). We deduce that: 
F(x) = F(u)+(1—F(u)): F, (x -— u) 


F (u) + (1 — F (u)) - H (x — u) 


13In this case, we estimate the linear model ê (u) = a+b- u+e for u > uo and deduce that ô = â/ (1 + ô) 
and € = b/ (1+6). 
14This means that ê (u) is calculated using the portfolio loss, that is the opposite of the portfolio return. 
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FIGURE 12.12: Mean residual life plot 


TABLE 12.6: Estimation of the generalized Pareto distribution 


Long US Long EM 
Short EM Short US 


Parameter Long US Long EM 


a 0.834 1.029 0.394 0.904 
é 0.160 0.132 0.239 0.142 
ô 0.719 0.909 0.318 0.792 
ê 0.138 0.117 0.193 0.124 


We consider a sample of size n. We note n, the number of observations whose value x; is 
larger than the threshold u. The non-parametric estimate of F (u) is then equal to: 


A Ny 


Therefore, we obtain the following semi-parametric estimate of F (x) for x larger than u: 
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We can interpret F (x) as the historical estimate of the probability distribution tail that is 
improved by the extreme value theory. We deduce that: 


VaR, = Ê! (a) 
a -£ 
= ute (Sa-a) -1) 
and: 
ES, = z [X | X> VaRa] 


= VaRa+E[X — VaRa | X > VaRa] 
| ô+ Ê (VaR —u) 


= VaRa 
1-¢ 
VaRa ô-— Eu 
= = + = 
1-€ 1-€ 


We consider again the example of the four portfolios with exposures on US and EM 
equities. In the sample, we have 3815 observations, whereas the value taken by ną when 
u is equal to 2% is 171, 161, 174 and 195 respectively. Using the estimates given in Table 
12.6, we calculate the daily value-at-risk and expected shortfall of the four portfolios. The 
results are reported in Table 12.7. If we compare them with those obtained in Table 12.5 
on page 773, we notice that the GPD VaR is close to the GEV VaR. 


TABLE 12.7: Estimating value-at-risk and expected shortfall risk measures using the 
generalized Pareto distribution 


Risk Long US Long EM 

measure j Long US Long EM Short EM Short US 
99.0% 3.20% 3.42% 2.56% 3.43% 
VaR 99.5% 3.84% 4.20% 2.88% 4.13% 
99.9% 5.60% 6.26% 3.80% 6.02% 
99.0% 4.22% 464% 3.09% 4.54% 
ES 99.5% 4.97% 5.52% 3.48% 5.34% 
99.9% 7.01% 7.86% 4.62% 7.49% 


12.3 Multivariate extreme value theory 


The extreme value theory is generally formulated and used in the univariate case. It can 
be easily extended to the multivariate case, but its implementation is more difficult. This 
section is essentially based on the works of Deheuvels (1978), Galambos (1987) and Joe 
(1997). 
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12.3.1 Multivariate extreme value distributions 
12.3.1.1 Extreme value copulas 


An extreme value (EV) copula satisfies the following relationship: 
C (uj, un) = CŒ (w1,..., Un) 


for all t > 0. For instance, the Gumbel copula is an EV copula: 
© (uf, us) = exp (- (( mui)’ + (-m)) 
o 0 Ne 
= exp (- (: ((- lnu) + (—In wg) )) ) 
t 
= (ex (- ((- In u)? + (—In uw2)’) A) 


=C’ (u1, U2) 
but it is not the case of the Farlie-Gumbel-Morgenstern copula: 
C (uj, u$) = utuh + bulub (1 = ui) (1 — u5) 
= ulus, (1 +0 — ut — Oui, + bujus) 
A ulus (1 +0 — Ou, — Ouz + buruz) 
# C (uz, u2) 


The term ‘extreme value copula’ suggests a relationship between the extreme value 
theory and these copula functions. Let X = (X1, ..., Xn) be a random vector of dimension 
n. We note Xm:m the random vector of maxima: 


Xmm, 
Am om : 


Xm:mn 
and Fm:m the corresponding distribution function: 
Fin:m (x1, eee ia) = Pr IX mmi < Ti,- E < Ln} 


The multivariate extreme value (MEV) theory considers the asymptotic behavior of the 
non-degenerate distribution function G such that: 


lim Pr ( 


m—> co 


Xm:m,1 = Dmi < Fie X mmn _ bmn < a 


e > 


Qm,1 m,n 
Using Sklar’s theorem, there exists a copula function C (G) such that: 
G (#1,..-,;2n) =C (G) (Gy (@1),.. ., Gy (£n)) 


It is obvious that the marginals G1,..., Gn satisfy the Fisher-Tippet theorem, meaning that 
the marginals of a multivariate extreme value distribution can only be Gumbel, Fréchet or 
Weibull distribution functions. For the copula C (G), we have the following result: C (G) 
is an extreme value copula. 
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With the copula representation, we can then easily define MEV distributions. For in- 
stance, if we consider the random vector (X1, X2), whose joint distribution function is: 


pee daey (- (Cne (x1))° + (—In w2)’) “A 


we notice that X4 is a Gaussian random variable and X92 is a uniform random variable. We 
conclude that the corresponding limit distribution function of maxima is: 


G (£1, £2) = exp (- (C mA (a1)? + (-n By e) ”) 


In Figure 12.13, we have reported the contour plot of four MEV distribution functions, whose 
marginals are GEV (0,1,1) and GEV (0, 1, 1.5). For the dependence function, we consider the 
Gumbel-Hougaard copula and calibrate the parameter 0 with respect to the Kendall’s tau. 


T = 0.00 T = 0.50 


FIGURE 12.13: Multivariate extreme value distributions 


12.3.1.2 Deheuvels-Pickands representation 


Let D be a multivariate distribution function, whose survival marginals are exponen- 
tial and the dependence structure is an extreme value copula. By using the relationship!" 
C(ui,...,Un) = C(e™,...,e7%) = D (&,..., &,), we have D* (i) = D (tù). Therefore, 
D is a min-stable multivariate exponential (MSMVE) distribution. 

We now introduce the Deheuvels/Pickands MSMVE representation. Let D (ŭ) be a 
survival function with exponential marginals. D satisfies the relationship: 


—InD(t- a) = —t -In D (a) Vt>0 


15We recall that & = — ln u. 
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if and only if the representation of D is: 


-mp(a)= f- | max (au) aS(a) vāzo 


where S,, is the n-dimensional unit simplex and S is a finite measure on Sn. This is the 
formulation! given by Joe (1997). Sometimes, the Deheuvels/Pickands representation is 
presented using a dependence function B (w) defined by: 


D (ai) = exp (- (£a) B (w, gia vn) 
B(w) = fof max (qiwi) dS (a) 


where wi = (X ay ŭ;. Tawn (1990) showed that B is a convex function and satisfies 
the following condition: 


max (W1,...,Wn) < B (w1,..., Wn) <1 (12.5) 
We deduce that an extreme value copula satisfies the PQD property: 
Ct <C Ct 


In the bivariate case, the formulation can be simplified because the convexity of B and 
the condition (12.5) are sufficient (Tawn, 1988). We have: 


C(ui,u2) = D (t,t) 
~ sd ü tia 
z _ B 
exp ( (1 + ti) Er) 
7 ln u1 In ug 
= exp (1 (uju2) B (z (uyu2)’ In T) 


= exp (1 (u1u2) A (45) 


where A(w) = B (w,1 — w). A is a convex function where A(0) = A (1) = 1 and satisfies 
max (w,l—w) < A(w) <1. 


Example 131 For the Gumbel copula, we have: 


-nD (ñ, ü) = (a8 +08)” 
5 z /0 
_ (af + a8)’ _ (,,,0 9) 1/8 
B (wi, w2) = “Taig tig) = (ui + 2) 
1/0 
A(w) = (w+ (1-w)’) 


16Note that it is similar to Proposition 5.11 of Resnick (1987), although the author does not use copulas. 
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We verify that a bivariate EV copula satisfies the PQD property: 


max(w,l—w) < A(w) <1 
In u1 In ug ln u1 
< A| —— } <1 
= max (z (u1u2)’ In me) T7 (z mu) E 


l 
ara ) > ln (u1u2) 


< min (lnu, lInu2) > In (urua): A| ———~ 
In (u1u2) 


< min (u1, u2) > exp | ln (uuz): A u > uug 
ln (u1U2) 


~ Ct>+c>+ct 


When the extreme values are independent, we have A(w) = 1 whereas the case of perfect 
dependence corresponds to A (w) = max (w,1— w): 


C(u1,u2) = exp (in cua) - max ( ln u In u2 )) 


In (uyu2)’ In (u1u2) 


= min (uj, u2) 


ct (ur, u2) 


In Table 12.8, we have reported the dependence function A (w) of the most used EV copula 
functions. 


TABLE 12.8: List of extreme value copulas 


Copula 0 C (u1, u2) A (w) 
C- U1 U2 1 
Gumbel [1,co) exp (- (at + ui) P) (w? +(1- w) 
Gumbel II [0,1]  uruzexp (o SL Ow? — Ow +1 
ui + U2 o ya 

Galambos [0,0o) urruz exp Cy + üz’) on 1— (w~? +(1- w) *) E 
Hüsler-Reiss [0,0co) exp (—ti10 (ur, u2; 0) — Get (u2,u1;0)) wr (w;6) + (1 -— w)«(1-— w;8) 
Marshall-Olkin [0,1]? ut~®!'u3~°? min (uy ; u3?) max (1 — 0w, 1 — 02 (1 — w)) 
cr min (u1, u2) max (w, 1 — w) 

V (u1, u2;0) = © ($ + $ In (Iin u/n u2)) 

k (w; 0) = V (w, 1 — w; 0) 
Source: Ghoudi et al. (1998). 
12.3.2 Maximum domain of attraction 

Let F be a multivariate distribution function whose marginals are F;,...,F,, and the 


copula is C (F). We note G the corresponding multivariate extreme value distribution, 
Gi,...,G, the marginals of G and C (G) the associated copula function. We can show that 
F € MDA (G) if and only if F; € MDA (G;) for alli = 1,...,n and C (F) € MDA (C (G)). 
Previously, we have seen how to characterize the max-domain of attraction in the univariate 
case and how to calculate the normalizing constants. These constants remains the same in 
the multivariate case, meaning that the only difficulty is to determine the EV copula C (G). 
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We can show that C (F) € MDA (C (G)) if C (F) satisfies the following relationship: 
lim C' (F) (a sacs) = C (G) (u1, ..-;, Un) 


Moreover, if C (F) is an EV copula, then C (F) € MDA (C (F)). This important result is 
equivalent to: 
1—C(F)((1-—w)",...,(1—u)”") 


ASO u = B (w1,..., Wn) 


In the bivariate case, we obtain: 


ME a (F) (a n u)') sie 


u—0 u 


for all ¢ € [0,1]. 


Example 132 We consider the random vector (X1, X2) defined by the following distribu- 
tion function: 


7 -1/0 
F (21,02) = ((1-e"*) +27’ - 1) 


on [0,00] x [0,1]. The marginals of F (a1,%2) are Fi (a1) = F(a1,1) = 1 — e™™! and 
Fə (a2) = F (co, 22) = x2. It follows that X, is an exponential random variable and Xə is 
a uniform random variable. We know that: 


Xani nn 


lim P 
im r( i 


n— co 


< 2) = A (zı) 
Xana— l 

lim Pr (Z < va) = W, (x2) 

n—oo n 


Since the dependence function of F is the Clayton copula: C(F)(ui,u2) = 


(u,? +5? = es, we have: 


1—C(F) (a ccs u=") 


lim = lim 
u—>0 U u—0 u 
= lim ao) 
u—0 u 
= 1 


We deduce that C (G) = C+. Finally, we obtain: 

G (z1,£2) = lim Pr Xe — lnn < £1, (Xn:n,2 — 1) < £2} 
A (21) 7 Wy (x2) 
= exp (—e *) - exp (x2) 


If we change the copula C (F), only the copula C (G) is modified. For instance, when C (F) 
is the Normal copula with parameter p < 1, then G (#1, £2) = exp (—e~*!) -exp (x2). When 
the copula parameter p is equal to 1, we obtain G (x1, £2) = min (exp (—e~*') , exp (x2)). 
When C(F) is the Gumbel copula, the MEV distribution becomes G(x1,22) = 


exp (- (em + a)". 
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12.3.3 Tail dependence of extreme values 


We can show that the (upper) tail dependence of C (G) is equal to the (upper) tail 


dependence of C (F): 


A* (C (G)) = à+ (C (F)) 


This implies that extreme values are independent if the copula function C (F) has no (upper) 


tail dependence. 


12.4 Exercises 


12.4.1 Uniform order statistics 


We assume that X4,.. 


., Xn are independent uniform random variables. 


1. Show that the density function of the order statistic X;:n is: 


2. Calculate the mean 


(n+ 1) 
T (i) (n-i+1) 


gi! (-— x) 


fin (x) T 


3 [Xin]. 


3. Show that the variance is equal to: 


4. We consider 10 samples of 8 independent observations from the uniform probability 


distribution Ujo 1): 


Sample 


Observation 
1 2 3 4 5 6 7 8 


ANAK WN EH 


No} 


10 


0.24 0.45 0.72 0.14 0.04 0.34 0.94 0.55 
0.12 0.32 0.69 0.64 0.31 0.25 0.97 0.57 
0.69 0.50 0.26 0.17 0.50 0.85 0.11 0.17 
0.53 0.00 0.77 0.58 0.98 0.15 0.98 0.03 
0.89 0.25 0.15 0.62 0.74 0.85 0.65 0.46 
0.74 0.65 0.86 0.05 0.93 0.15 0.25 0.07 
0.16 0.12 0.63 0.33 0.55 0.61 0.34 0.95 
0.96 0.82 0.01 0.87 0.57 0.11 0.14 0.47 
0.68 0.83 0.73 0.78 0.27 0.85 0.55 0.57 
0.89 0.94 0.91 0.28 0.99 0.40 0.99 0.68 


For each sample, find the order statistics. Calculate the empirical mean and standard 
deviation of X;:3 for i = 1,...,8 and compare these values with the theoretical results. 


. We assume that n is odd, meaning that n = 2k + 1. We consider the median statistic 
Xk+1n- Show that the density function of Xi:n is right asymmetric if i < k, symmetric 
about .5 if i = k + 1 and left asymmetric otherwise. 


. We now assume that the density function of X1,..., Xn is symmetric. How are im- 
pacted the results obtained in Question 5? 
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12.4.2 Order statistics and return period 


1. Let X and F be the daily return of a portfolio and the associated probability distri- 
bution. We note Xn:n the maximum of daily returns for a period of n trading days. 
Using the standard assumptions, define the cumulative distribution function Fn:n of 
Xnin if we suppose that X ~ N (p,07). 


2. How could we test the hypothesis Ho : X ~ N (u,07) using Prin? 


3. Define the notion of return period. What is the return period associated to the statis- 
tics F~! (99%), Fry (99%), Fz} (99%) and F32; (99%)? 


4. We consider the random variable X90:29. Find the confidence level œ which ensures 
that the return period associated to the quantile F3o:99 (a) is equivalent to the return 
period of the daily value-at-risk with a 99.9% confidence level. 


12.4.3 Extreme order statistics of exponential random variables 


1. We note T ~ E (A). Show that: 
Pr{7 >t|T >s}=Pr{r >t- s} 
where t > s. Comment on this result. 


2. Let T; be the random variable of distribution € (A;). Calculate the probability distri- 


bution of min (T1, ..., Tn) and max (T1,...,Tn) in the independent case. Show that: 
Pr{min(7,...,7,) = 7} = ee 
r{min (T1, ..., Tn) = Ti} = n 
Xi Àj 
3. Same question if the random variables 7,,..., 7, are comonotone. 


12.4.4 Extreme value theory in the bivariate case 
1. What is an extreme value (EV) copula C? 
2. Show that C+ and C+ are EV copulas. Why C7 cannot be an EV copula? 


3. We define the Gumbel-Hougaard copula as follows: 


C (u1, u2) = exp (- [(-m a)? ein wn)" "i 


with 0 > 1. Verify that it is an EV copula. 


4. What is the definition of the upper tail dependence A? What is its usefulness in 
multivariate extreme value theory? 


5. Let f (x) and g(x) be two functions such that lim,_,., f (£) = limz2, g(x) = 0. If 
g' (xo) # 0, L’Hospital’s rule states that: 
fle) _ | fi) 


aeo g(a) = g' (a) 


Deduce that the upper tail dependence À of the Gumbel-Hougaard copula is 2 — 2/9., 
What is the correlation of two extremes when 0 = 1? 
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6. We define the Marshall-Olkin copula as follows: 
C (u1, u2) = ut . uy? -min (up , ul?) 
where (61,02) € [0, 1]°. 
(a) 
(b) 
(c) 
) 


(d) In which case are two extremes perfectly correlated? 


Verify that it is an EV copula. 
Find the upper tail dependence A of the Marshall-Olkin copula. 


What is the correlation of two extremes when min (81,02) = 0? 


12.4.5 Maximum domain of attraction in the bivariate case 


1. We consider the following probability distributions: 


Distribution F (x) 
Exponential € (A) le 
Uniform Uio,1] x 
Pareto P(a,0) 1—(%*) 


For each distribution, we give the normalization parameters a, and bn of the Fisher- 
Tippet theorem and the corresponding limit probability distribution G (æ): 


Distribution An bn G(x) 
Exponential 7! Atlon AG@)=e-8” 

Uniform nt 1—-n! W,(x—-1) = e7! 
Pareto Bain! OnV/e—9@ Ba (1+2)= ate 


We note G (z1,£2) the asymptotic distribution of the bivariate random vector 
(Xinin, X2,n:n) Where X1; (resp. X2,) are iid random variables. 


(a) What is the expression of G (x1, 72) when X ,; and X2, are independent, X1; ~ 
E (A) and Xai N Uio,1]? 

(b) Same question when X1, ~ E (A) and X2; ~P (0,a). 

(c) Same question when X1, ~ Ujo,1] and X2, ~ P (0, a). 


2. What happen to the previous results when the dependence function between X,; and 
Xə; is the Normal copula with parameter p < 1? 


3. Same question when the parameter of the Normal copula is equal to one. 


4. Find the expression of G(a#1,22) when the dependence function is the Gumbel- 
Hougaard copula. 


Taylor & Francis 
Taylor & Francis Group 


http://taylorandfrancis.com 


Chapter 13 


Monte Carlo Simulation Methods 


Monte Carlo methods consist of solving mathematical problems using random numbers. 
The term ‘Monte Carlo’ was apparently coined by physicists Ulam and von Neumann at 
Los Alamos in 1940 and refers to gambling casinos in Monaco!. Until the end of the eighties, 
Monte Carlo methods were principally used to calculate numerical integration? including 
mathematical expectations. More recently, the Monte Carlo method designates all numer- 
ical methods that involves stochastic simulation and consider random experiments on a 
computer. 

This chapter is divided into three sections. In the first section, we present the different 
approaches to generate random numbers. Section two extends simulation methods when we 
manipulate stochastic processes. Finally, the last section is dedicated to Monte Carlo and 
quasi-Monte Carlo methods. 


13.1 Random variate generation 


Any Monte Carlo method is based on series of random variates that are independent 
and identically distributed (iid) according to a given probability distribution F. As we will 
see later, it can be done by generating uniform random numbers. This is why numerical 
programming softwares already contain uniform random number generators. However, true 
randomness is impossible to simulate with a computer. In practice, only sequences of ‘pseu- 
dorandom’ numbers can be produced with statistical properties that are close from those 
obtained with iid random variables. 


13.1.1 Generating uniform random numbers 


A first idea is to build a pseudorandom sequence S and repeat this sequence as often 
as necessary. For instance, for simulating 10 uniform random numbers, we can set S = 
{0,0.5, 1} and repeat this sequence four times. In this case, the 10 random numbers are: 


{0,0.5, 1,0,0.5, 1,0, 0.5, 1, 0} 


We notice that the period length of this sequence is three. The quality of the pseudorandom 
number generator depends on the period length, which should be large in order to avoid 
duplication and serial correlation. If we calculate the second moment of S, we do not obtain 
the variance of a uniform random variable Uo}. A good pseudorandom number generator 
should therefore pass standard adequacy tests. 


‘Monte Carlo is one of the four quarters of Monaco and houses the famous casino. 
In this case, we speak about Monte Carlo integration methods. 
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The most famous and used algorithm is the linear congruential generator (LCG): 


In = (a: zn-1 +c) modm 


Un = p/m 


where a is the multiplicative constant, c is the additive constant and m is the modulus 
(or the order of the congruence). To initialize the algorithm, we have to define the initial 
number zo, called the seed®. {x1, £2, . . . , &n } is a sequence of pseudorandom integer numbers 
(0 < £n < m) whereas {u1,U2,..-,Un} is a sequence of uniform random variates. We can 
show that the maximum period‘ is m and can be only achieved for some specific values of 
a, cand m. The quality of the random number generator will then depend on the values of 
the parameters. 


Example 133 If we consider that a = 3, c = 0, m = 11 and xp = 1, we obtain the following 
sequence: 
{1,3,9,5,4,1,3,9,5,4,1,3,9,5,4,...} 


The period length is only five, meaning that only five uniform random variates can be gen- 
erated: 0.09091, 0.27273, 0.81818, 0.45455 and 0.36364. 


The minimal standard LCG proposed by Lewis et al. (1969) is defined by a = 7°, c=0 
and m = 23! — 1. In Table 13.1, we report two sequences generated with the seed values 1 
and 123456. This generator is widely used in numerical programming languages. However, 
its period length is equal to m—1 = 23! —2 ~ 2.15 x 10°, which can be judged as insufficient 
for some modern Monte Carlo applications. For instance, if we consider the LDA model in 
operational risk with a Poisson distribution P (1000), we need approximately 101° random 
numbers for drawing the severity loss if the number of Monte Carlo simulations is set to 
ten million. Another drawback is that LCG methods may exhibit lattice structures. For 
instance, Figure 13.1 shows the dependogram between u,—1 and un when a = 10, c = 0 
and m = 23) — 1. 


TABLE 13.1: Simulation of 10 uniform pseudorandom numbers 


Tn Un Tn Un 
1 0.000000 123456 0.000057 
16807 0.000008 2074924992 0.966212 
282475249 0.131538 277396911 0.129173 
1622650073 0.755605 22885540 0.010657 
984943658 0.458650 237697967 0.110687 
1144108930 0.532767 670147949 0.312062 
470211272 0.218959 1772333975 0.825307 
101027544 0.047045 2018933935 0.940139 
1457850878 0.678865 1981022945 0.922486 
1458777923 0.679296 466173527 0.217079 
2007237709 0.934693 958124033 0.446161 


SEMarankwnrols 


Nowadays, with a 64-bit computer, the maximum period of a LCG algorithm is 2°4—1 = 
1.85 x 10/9. To obtain a larger period length, one can use more sophisticated methods. For 


3If the seed is not specified, programming softwares generally use the clock of the computer to generate 
the initial value. 
4It is equal to m — 1 if c= 0. 
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FIGURE 13.1: Lattice structure of the linear congruential generator 


instance, multiple recursive generators are based on the following transition equation: 


k 
Ln = (>: Qi Lyi + e) mod m 


i=1 
To obtain a bigger period, we can also combine LCG algorithms with different periods. 
For instance, the famous MRG32k3a generator of L’Ecuyer (1999) uses two 32-bit multiple 


recursive generators: 
£n = (1403580 - xp_-2 — 810728 - e,_3) mod mı 


{ Yn = (527612 * Yn-1 7 1370589 - Yn—3) mod Mə 
where mı = 232? — 209 and mz = 23? — 22853. The uniform random variate is then equal to: 


ai _ fn = Yn t {en S Yn} ma 
m mı +1 


L’Ecuyer (1999) showed that the period length of this generator is equal to 219! ~ 3 x 1057. 


13.1.2 Generating non-uniform random numbers 
We now consider X a random variable whose distribution function is noted F. There 
are many ways to simulate X, but all of them are based on uniform random variates. 
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13.1.2.1 Method of inversion 


Continuous random variables We assume that F is continuous. Let Y = F (X) be the 
integral transform of X. Its cumulative distribution function G is equal to: 


G(y) = Pr{Y <y} 

= Pr{F(X) <y} 

= Prix <F™(y)} 

= F(F~ (y)) 

= y 
where G (0) = 0 and G (1) = 1. We deduce that F (X) has a uniform distribution Ujo,1]. It 
follows that if U is a uniform random variable, then F~!(U) is a random variable whose 
distribution function is F. To simulate a sequence of random variates {x1,...,%n}, we 


can simulate a sequence of uniform random variates {u1,...,Un} and apply the transform 
Lie F~! (ui). 


II 


Example 134 If we consider the generalized uniform distribution Uja j, we have F (x) 
(x—a)/(b—a) and F~! (u) = a + (b—a)u. The simulation of random variates x; is 
deduced from the uniform random variates u; by using the following transform: 


zi a+ (b-a) ui 


Example 135 In the case of the exponential distribution E (A), we have F(x) = 1 — 
exp (—Ax). We deduce that: 
In (1 — ui) 

À 


Since 1 — U is also a uniform distributed random variable, we have: 


wy e 


In (ui 
„o hlu) 
Example 136 In the case of the Pareto distribution P (a,x_), we have F(x) = 1 — 
(x/x-) | and F`! (u) = x- (1 — u)", We deduce that: 

z 


(hag)? 


Lie 


The method of inversion is easy to implement when we know the analytical expression 
of F71. When it is not the case, we use the Newton-Raphson algorithm: 


where x?” is the solution of the equation F (x) = u at the iteration m. For instance, if we 
apply this algorithm to the Gaussian distribution M (0,1), we have: 


uj — © (x7) 
o (x}") 


apt = ap + 
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Discrete random variables In the case of a discrete probability distribution 
{(@1,p1) , (@2,p2),---,(Xn,Pn)} where z1 < £2 <... < £n, we have: 


xı if O<Su<p 
x2 if pı < u< pı +p 
F~! (u) = ; 
gy if Spe <u<i 


In Figure 13.2, we illustrate the method of inversion when the random variable is discrete. 
We assume that: 


zi I1 2 4 6 7 9 0 
p: 10% 20% 10% 5% 20% 30% 5% 
F(x;) 10% 30% 40% 45% 65% 95% 100% 


Because the cumulative distribution function is not continuous, the inverse function is a 
step function. If we suppose that the uniform random number is 0.5517, we deduce that the 
corresponding random number for the variable X is equal to 7. 


-4 -3 -2 -1 O 1 2 3 4 5 6 7 8 9 10 11 12 13 14 
x 


FIGURE 13.2: Inversion method when X is a discrete random variable 


Example 137 If we apply the method of inversion to the Bernoulli distribution B (p), we 
have: 
L 0 if O<u<l-p 
i 1 if l-p<u<1 


or: 


1 if u<p 
ze] 4 if u>p 
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Piecewise distribution functions A piecewise distribution function is defined as fol- 
lows: 
F (x) = Fm (2) if x € |j- 2h] 


m—1) “m 


where x*, are the knots of the piecewise function and: 
Fing1 (Um) = Fm (2m) 
In this case, the simulated value x; is obtained using a search algorithm: 


ri 4 F3’ (ui) if F (at, 4) < u; < F (£%,) 


m—1 


This means that we have first to calculate the value of F (x) for all the knots in order to 
determine which inverse function F}! will be apply. 


Let us consider the piecewise exponential model described on page 202. We reiterate 
that the survival function has the following expression: 


S(t) =S (t1) erta) ifte nth] 
We know that S (T) ~ U. It follows that: 


1 p6 (a) 


m i 


ti < toci + 


if S (th) < ui < S (th—1) 


Example 138 We model the default time T with the piecewise exponential model and the 
following parameters: 


5% if tis less or equal than one year 
AX=< 8% if tis between one and five years 
12% if t is larger than five years 


We have S (0) = 1, S (1) = 0.9512 and S (5) = 0.6907. We deduce that: 
0 + (1/0.05) - In (1/u;) if u; € (0.9512, 1] 


tı — 1+ (1/0.08) -In (0.9512/u;) if u; € [0.6907,0.9512[ 
5 + (1/0.12) - In (0.6907 /u;) if u; € [0,0.6907[ 


In Table 13.2, we have reported five simulations t; of the default time 7. For each simulation, 
we indicate the values taken by t% S (21) and Am. 


m—1) 
TABLE 13.2: Simulation of the piecewise exponential model 


ui hna S(t) Àm ti 
1.0000 0.05 0.1003 


0.9950 0 

0.3035 5 0.6907 0.12 11.8531 
0.5429 5 0.6907 0.12 7.0069 
0.9140 1 0.9512 0.08 1.4991 
0.7127 1 0.9512 0.08 4.6087 
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13.1.2.2 Method of transformation 


Let {Y1, Yo,...} be a vector of independent random variables. The simulation of the 
random variable X = g (Y1, Y2,...) is straightforward if we know how to easily simulate 
the random variables Y;. We notice that the inversion method is a particular case of the 
transform method, because we have: 


X =g(U)=F"(U) 


Example 139 The Binomial random variable is the sum of n tid Bernoulli random vari- 
ables: 


B(n,p) = > 6i (p) 


We can therefore simulate the Binomial random variate x using n uniform random numbers: 
n 
z= 2, 1 {ui < p} 
i=1 


If we would like to simulate the chi-squared random variable x? (v), we can use the 


following relationship: 


xX (v) = >x% (1) = $ (NG (0,1)? 


i=1 


We can therefore simulate the x? (v) random variate with v independent Gaussian random 
numbers M (0,1). For that, we generally use the Box-Muller algorithm, which states that if 
Uı and U2 are two independent uniform random variables, then X; and Xə defined by: 


Xı = V—2InV, - cos (27U2) 
Xo = vy —2 İn U1 - sin (27U2) 


are independent and follow the Gaussian distribution M (0,1). 


Remark 149 To simulate a Student’s t random variate x with v degrees of freedom, we 
need v + 1 normal independent random variables n;i: 


Ny+1 
Ivy” 2 
VYS iani 


However, this method is not efficient and we generally prefer to use the Bailey algorithm 
based on the polar transformation’ . 


T< 


On page 339, we have seen that if N; is a Poisson process with intensity A, the duration 
T between two consecutive events is an exponential distributed random variable. We have: 


Pr(T<t)=1- e7% 


Since the durations are independent, we have: 


eE Des omnes y =) Ei 
i=1 


5This method is presented on page 887. 
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where F; ~ E (A). Because the Poisson random variable is the number of events that occur 
in the unit interval of time, we also have: 


X = max{n:T,+7T.+...+T, <1} 


We notice that: 


1 n 
X = max fa: ho] < r} 


II 
B 
D 
P< 
m 
z 
m 
S 
IV 
a 
et 
YS 


We can then simulate the Poisson random variable with the following algorithm: 
1. set n = 0 and p= 1; 
2. calculate n = n + 1 and p = p - u; where u; is a uniform random variate; 


3. if p > e~, go back to step 2; otherwise, return X =n — 1. 


13.1.2.3 Rejection sampling 


Following Devroye (1986), F (x) and G (x) are two distribution functions such that 
f(x) < cg (x) for all x with c > 1. We note X ~ G and consider an independent uniform 
random variable U ~ Uo). Then, the conditional distribution function of X given that 
U < f(X) / (cg (X)) is F (2). 

Let us introduce the random variables B and Z: 


B = fus ith 


cg (X) 
= f(X) 
z = xsi% 
We have: 
Pr{B=1} = Prius 2O) 
- s[i% 
cg (X) 
bags se) 
= | 50r 
+00 
= “| f (a) da 
1 
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The distribution function of Z is defined by: 


Us 


Pr{Z<a}=Pr{X <a 


We deduce that: 


Pr{Z<a} = 


er f(x)/(cg(a)) 
= cf | g(x) dudax 
0 
f 


This proves that Z ~ F. From this theorem, we deduce the following acceptance-rejection 
algorithm: 


1. generate two independent random variates x and u from G and Ujo,1); 


2. calculate v as follows: 


vV = 


Ww 


cg (x 
3. if u < v, return x (‘accept’); otherwise, go back to step 1 (‘reject’). 


The underlying idea of this algorithm is then to simulate the distribution function F by 
assuming that it is easier to generate random numbers from G, which is called the pro- 
posal distribution. However, some of these random numbers must be ‘rejected’, because the 
function c- g (x) ‘dominates’ the density function f (x). 


Remark 150 We notice that the number of iterations N needed to successfully generate 
Z has a geometric distribution G (p), where p = Pr {B = 1} =c7! is the acceptance ratio. 
We deduce that the average number of iterations is equal to E[N] = 1/p = c. In order to 
maximize the efficiency (or the acceptance ratio) of the algorithm, we have to choose the 
constant c such that: 


Let us consider the normal distribution M (0,1). We use the Cauchy distribution function 
as the proposal distribution, whose probability density function is given by: 


1 
9) = aw 


We can show that: 


Van 


$ (x) < oo! (x) 
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O.5¢ 


; A =a= 


===: ceg(x) — Cauchy 


O.3F \ 
| \ 


0.2} 


j 


x 
FIGURE 13.3: Rejection sampling applied to the normal distribution 
meaning that c ~ 1.52. In Figure 13.3, we report the functions f (x) = ¢ (x) and c- g (x). 


The goal of the acceptance-rejection algorithm is to ‘eliminate’ the random numbers, which 
are located in the cross-hatched region. Concerning the Cauchy distribution, we have: 


1 1 
G (a) = 5 + E arctan z 


G= (« (« i z)) 


Therefore, we deduce that the following algorithm for simulating the distribution function 


N (0,1): 


and: 


1. generate two independent uniform random variates u, and ug and set: 


sefe (a =i) 


2. calculate v as follows: 


3. if uo < v, accept x; otherwise, go back to step 1. 


To illustrate this algorithm, we have simulated six Gaussian distributed random variates in 
Table 13.3. We notice that four simulations have been rejected. Using 1000 simulations of 
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Cauchy random variates, we obtained the density given in Figure 13.4, which is very close 
to the exact probability density function. In our case, we accept 683 simulations, meaning 
that the acceptance ratio® is 68.3%. 


TABLE 13.3: Simulation of the standard Gaussian distribution using the acceptance- 
rejection algorithm 


Uy U2 x Vv test z 
0.9662 0.1291 9.3820 0.0000 reject 
0.0106 0.1106 —30.0181 0.0000 reject 
0.3120 0.8253 —0.6705 0.9544 accept —0.6705 
0.9401 0.9224 5.2511 0.0000 reject 
0.2170 0.4461 —1.2323 0.9717 accept —1.2323 
0.6324 0.0676 0.4417 0.8936 accept 0.4417 
0.6577 0.1344 0.5404 0.9204 accept 0.5404 
0.1596 0.6670  —1.8244 0.6756 accept —1.8244 
0.4183 0.3872  —0.2625 0.8513 accept —0.2625 
0.9625 0.0752 8.4490 0.0000 reject 


Exact density 


===: Simulated density 


FIGURE 13.4: Comparison of the exact and simulated densities 


Remark 151 The discrete case is analogous to the continuous case. Let p(k) and q (k) be 
the probability mass function of Z and X such that p(k) < cq (k) for all k with c > 1. We 
consider an independent uniform random variable U ~ Ujo 1). Then, the conditional pmf of 
X given that U < p(X) /(cq(X)) is the pmf p(k) of Z. 


6The theoretical acceptance ratio is equal to 1/1.52 ~ 65.8%. 
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13.1.2.4 Method of mixtures 


A finite mixture can be decomposed as a weighted sum of distribution functions. We 
have: 


F (x) = So m+ Gx (2) 
k=1 


where mą > 0 and = Tk = 1. We deduce that the probability density function is: 


f(t) = Some ge (2) 
k=1 
To simulate the probability distribution F, we introduce the random variable B, whose 
probability mass function is defined by: 
p(k) =Pr{B =k} = Tk 
It follows that: 


F (x) = X Pr{B=k}- Gx (2) 


k=1 
We deduce the following algorithm: 


1. generate the random variate b from the probability mass function p (k); 
2. generate the random variate x from the probability distribution G, (x). 
Example 140 We assume that the default time T follows the hyper-exponential model: 
f) = r Aet + (1 — a) - Age ** 
To simulate this model, we consider the following algorithm: 
1. we generate u and v two independent uniform random numbers; 


b 1 ifu<r 
< 2 otherwise 


2. we have: 


3. the simulated value of rT is: 


i -Ailnv ifb=1 
©] _yoInv ifb=2 


Remark 152 The previous approach can be easily extended to continuous mixtures: 


f(a) = f rogle) dw 


where w E€ Q is a parameter of the distribution G. For instance, we have seen that the 
negative binomial distribution is a gamma-Poisson misture distribution: 


{ NB(r,p) ~ P (A) 
A ~ G(r, (1 — p) /p) 
To simulate the negative binomial distribution, we first simulate the gamma random variate 


g ~ G(r,(1—p) /p), and then simulate the Poisson random variable, whose parameter’ A 
is equal to g. 


‘This means that the parameter À changes at each simulation. 
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13.1.3 Generating random vectors 


In this section, we consider algorithms for simulating a random vector X = (X1,...,Xn) 
from a given distribution function F (x) = F (a1,..., £n). In fact, the previous methods used 
to generate a random variable are still valid in the multidimensional case. 


13.1.3.1 Method of conditional distributions 


The method of inversion cannot be applied in the multivariate case, because U = 
F(X1,...,X») is not any longer a uniform random variable. However, if X1,...,Xn are 


independent, we have: 
n 


F (@1,...,2n) = J[F: (xi) 


i=1 
To simulate X, we can then generate each component X; ~ F; individually, for example by 
applying the method of inversion. When X1,..., Xn are dependent, we have: 


F(01,.-.,%) = Fi (£1) Fop (2 | £1) F3j1,2 (£3 | £1, £2) X +++ x 
Fyji,....n—1 (2x | Ti,- Üni) 
n 
= TETES (zil Tisei tii) 
i=1 
where Fij, i1 (% | £1,- --,£i-1) is the conditional distribution of X; given X, = 
%1,...,Xj-1 = Ti—1. Let us denote this ‘conditional’ random variable Y;. We notice that 


the random variables (Yi,...,Y;,) are independent. Therefore, the underlying idea of the 
method of conditional distributions is to transform the random vector X by a vector Y of 
independent random variables. We obtain the following algorithm: 


1. generate x, from F; (x) and set i = 2; 


2. generate x; from Fj) 
i=i+ l1; 


i-1 (a | Ti; ,Zi—1) given Xı = Fyr ,Xi—1 = Wie] and set 


PEOR 


3. repeat step 2 until i = n. 


Fiji, i1 (£ | £1,- --, Zi—1) is a univariate distribution function, which depends on the ar- 
gument x and parameters 21,...,2;-1. To simulate it, we can therefore use the method of 
inversion: 


Li Pui (ui | DB1yees , 4-1) 


Ait is the inverse of the conditional distribution function and u; is a uniform 


random variate. 


Example 141 We consider the bivariate logistic distribution defined as: 
F (z1, £2) = (1 +e + era). 


We have Fy (x1) = F (a1, +00) = (1 + ew), We deduce that the conditional distribution 
of Xə given Xı = 2 is: 


F (x1, £2) 
F; (21) 
l+e" 


Foi (x2 | £1) = 
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We obtain: 
F7'(u) =Inu—In(1—-1u) 


and: 


Foi (u | z1) =Inu—In(1—u) —In(1+e™*?) 
We deduce the following algorithm: 
1. generate two independent uniform random variates u and ug; 


2. generate x; from u1: 
x, + Inu, — ln (1 — u) 


3. generate £2 from uz and zı: 


xq + ln u2 — ln (1 — u2) In (1+ e7) 


Because we have (1 + e781)! = uy, the last step can be replaced by: 


( U1 U2 ) 
LQ — lr 
1 — ue 


The method of conditional distributions can be used for simulating uniform random 


3’. generate £2 from uz and uy: 


vectors (U1,...,Un) generated by copula functions. In this case, we have: 
C (u1,..., Un) = Cı (u1) Con (u2 | u1) C3]1,2 (u3 | u1, U2) xr XK 
Cabani (un | Ul; ,Un—1) 
n 
a Il Cija, i1 (Ua | Ui, - +. Ui—-1) 
i=1 
where Cij,..i—1 (ui | U1,--.,Ui-1) is the conditional distribution of U; given U, = 


u,,...,U;_1 = uj. By definition, we have C; (u1) = u,. We obtain the following al- 
gorithm: 


1. generate n independent uniform random variates v1,...,Un; 
2. generate u; < vı and set i = 2; 


3. generate u; by finding the root of the equation: 


and set i = i + 1; 
4. repeat step 3 until i = n. 


For some copula functions, there exists an analytical expression of the inverse of the condi- 
tional copula. In this case, the third step is replaced by: 


3’. generate u; by the inversion method: 
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Remark 153 For any probability distribution, the conditional distribution can be calculated 
as follows: 
F (xı, soe ,Zi—1, £i) 

F (21, wae ,Zi—1) 


Eiji (x; | Ti,- rti) = 


In particular, we have: 


II 


O1 F (x1, £2) A, (Fi (x1) - Fop (x2 | £1)) 


= fi (%1) + Fay (z2 | z1) 
For copula functions, the density fı (a1) is equal to 1, meaning that: 
Cojı (u2 | u1) = A, C (u1, u2) 


We can generalize this result and show that the conditional copula given some random 
variables U; for i € Q is equal to the cross-derivative of the copula function C with respect 
to the arguments u; fori E Q. 


We recall that Archimedean copulas are defined as: 
C (u1, u2) = 9 * (yp (ur) + ẹ (u2)) 
where ọ (u) is the generator function. We have: 


y (C (u1, u2)) = ọ (u1) + y (u2) 


and: ia 
P (C (unua): SEEE L g (un) 
We deduce the following expression of the conditional copula: 
is. OC (u1, U2) 
Cai (ug |u) = a I 
yg’ (u1) 


gy! (p7! (ep (u1) + ¥ (u2))) 


The calculation of the inverse function gives: 


Calu =~? (v (e (=<) = (us) 


We obtain the following algorithm for simulating Archimedean copulas: 
1. generate two independent uniform random variates vı and v9; 
2. generate uy + v1; 


3. generate u2 by the inversion method: 


wet (efe (E) om) 


Example 142 We consider the Clayton copula: 


= = —1/0 
C (u1, u2) = (uy? + u3’ — 1) / 
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The Clayton copula is an Archimedean copula, whose generator function is: 


g(ujau’—1 
We deduce that: 
gilu = Q+u 
y! (u) —ĝu FD 
gu) = (-u/ay Ver 


After some calculations, we obtain: 


Co (v | ui) = (1 +u’ (pae — 1)) 


In Table 13.4, we simulate five realizations of the Clayton copula using the inverse function 
of the conditional copula. In the case 6 = 0.01, u2 is close to vg because the Clayton copula 
is the product copula C+ when 9 tends to 0. In the case 0 = 1.5, we note the impact of the 
conditional copula on the simulation of ug. 


—1//0 


TABLE 13.4: Simulation of the Clayton copula 


Random uniform | Clayton copula 
variates ! 6 =0.01 l 0=1.5 
U1 V2 ı UL u2 ! ùl u2 
0.2837 0.4351 | 0.2837 0.4342 0.2837 0.3296 
0.0386 0.2208 | 0.0386 0.2134 ı 0.0386 0.0297 


I 
l 
0.3594 0.5902 ! 0.3594 0.5901 | 0.3594 0.5123 
l 
| 


0.3612 0.3268 | 0.3612 0.3267 | 0.3612 0.3247 
0.0797 0.6479 | 0.0797 0.6436 ' 0.0797 0.1704 


13.1.3.2 Method of transformation 


To simulate a Gaussian random vector X ~ N (u, ©), we consider the following trans- 
formation: 
X=p+A-N 
where AA! = © and N ~ N (0,I). Therefore, we can simulate a correlated Gaussian 
random vector by using n independent Gaussian random variates M (0,1) and finding a 
square matrix A such that AA! = X. Since we know that ¥ is a positive definite symmetric 
matrix, it has a unique Cholesky decomposition: 


X= PP' 
where P is a lower triangular matrix. 
Remark 154 The decomposition AA' = E is not unique. For instance, if we use the 
eigendecomposition: 

£X =UAU! 
we can set A = UA"? Indeed, we have: 

AAT = UAA 2UT 
= UAU! 


= 
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To simulate a multivariate Student’s t distribution Y = (Yi,..., Yn) ~ Tn (È, v), we use 


the relationship: 
Xi 


Yi 
J/Z/v 


where the random vector X = (X,,..., Xn) ~ N (0, £) and the random variable Z ~ x? (v) 
are independent. 


The transformation method is particularly useful for simulating copula functions. Indeed, 
if X = (X1,..., Xn) ~ F, then the probability distribution of the random vector U = 
(U1,...,Un) defined by: 


is the copula function C associated to F. 


Example 143 To simulate the Normal copula with the matrix of parameters p, we simulate 
N ~N (0,1) and apply the transformation: 


U=®(P-N) 
where P is the Cholesky decomposition of the correlation matriz p. 


Example 144 To simulate the Student’s t copula with the matrix of parameters p and v 
degrees of freedom, we simulate T ~ Tn (p, v) and apply the transformation: 


U; = Ty (Ti) 


In Figures 18.5 and 13.6, we draw 1024 simulations of Normal and tı copulas for different 
values of p. We notice that the Student’s t copula correlates the extreme values more than 
the Normal copula. 


On page 735, frailty copulas have been defined as: 
C (ur... Un) = Y (Y7 (ur) +... +B? (Un) 


where w (x) is the Laplace transform of a random variable X. Using the mixture representa- 
tion of frailty copulas, Marshall and Olkin (1988) showed that they can be generated using 
the following algorithm: 


1. simulate n independent uniform random variates v1,...,Un; 
2. simulate the frailty random variate x with the Laplace transform 4; 


3. apply the transformation: 


(Urs. y Un) € (+ (-==) i 


For instance, the Clayton copula is a frailty copula where % (x) = (1 + z)" ? is the Laplace 
transform of the gamma random variable G (1/0,1). Therefore, the algorithm to simulate 
the Clayton copula is: 


x + G (1/0,1) 


—1/0 —1/0 
uun) © ( (1-52) a) 
x xz 


Examples of simulating the Clayton copula using this algorithm is given in Figure 13.7. 
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FIGURE 13.6: Simulation of the tı copula 
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FIGURE 13.7: Simulation of the Clayton copula 


Remark 155 For other frailty copulas, the reader can refer to the survey of McNeil (2008) 
for the list of Laplace transforms and corresponding algorithms to simulate the frailty random 
variable. 


We now consider the multivariate distribution F (z1,..., £n), whose canonical decom- 
position is defined as: 


E (fize; En) =C (F1 @i) p00 Fa (a) 


We recall that if (U1,...,Un) ~ C, the random vector (X1,..., Xn) = (FI! (Ui),..., 
F," (Un)) follows the distribution function F. We deduce the following algorithm: 


{ (U1,---;Un) —C 
(£1,..., £n) = (FI! (u1), ..., E7! (un)) 


Let us consider that the default time 7 and the loss given default LGD of one counter- 
party are distributed according to the exponential distribution € (5%) and the beta distribu- 
tion B (2,2). We also assume that the default time and the loss given default are correlated 
and the dependence function is a Clayton copula. In Figure 13.8, we use the Clayton ran- 
dom variates generated in Figure 13.7 and apply exponential and beta inverse transforms 
to them. For the beta distribution, we use the Newton-Raphson algorithm to generate the 
LGD random variable. 


The previous algorithms suppose that we know the analytical expression F; of the uni- 
variate probability distributions in order to calculate the quantile function F7 '. This is not 
always the case. For instance, in the operational risk, the loss of the bank is equal to the 
sum of aggregate losses: 
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FIGURE 13.8: Simulation of the correlated random vector (T, LGD) 


where S% is also the sum of individual losses for the kt? cell of the mapping matrix. In 
practice, the probability distribution of S; is estimated by the method of simulations. In 
this case, we have to use the method of the empirical quantile function. Let F; m be the 
empirical process of X;. We know that: 


sup |F; m (x) — F; (x)| > 0 when m — œ 


We note Um and Fm the empirical processes corresponding to the distribution functions 
C (u1,..., Un) and F (z1,..., £n). The Glivenko-Cantelli theorem tells us that: 


sup |Fm (z1,..., £n) — F (z1,...,£n)| > 0 when m —> oo 
Ti,- Tn 
We deduce that: 
sup Uas (Fi aay (Ui) yess E a (un)) -C (FI! (u1),..., E71! (un))| > 0 


Ut, Un 


when both mı and mə tend to oo. It follows that the method of the empirical quantile 
function is implemented as follows: 


1. for each random variable X;, simulate mı random variates x¥,, and estimate the 


m 


empirical distribution F;; 
2. simulate a random vector (u1,..., Un) from the copula function C (u1,..., Un); 
3. simulate the random vector (£1,..., £n) by inverting the empirical distributions F;: 
zti f! (ui) 
we also have: 
zi < inf fe 


1 
H DY 1 {a < at gt > uw} 


4. repeat steps 2 and 3 m2 times. 
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In Figure 13.9, we illustrate this algorithm by assuming that Xı ~ N (0,1), X2 ~N (0,1) 
and the dependence function of (X1, X2) is the Clayton copula with parameter 6 = 3. If we 
use mı = 50 simulations to estimate the quantile function of X; and X2, the approximation 
is not good. However, when we consider a large number of simulations (mı = 5000), we 
obtain simulated values of the random vector (X1, X2) that are close to the simulated values 
calculated with the analytical quantile function ®~' (u). We now consider a more complex 
example. We assume that Xı ~ N (—1,2), X2 ~ N (0,1), Yı ~ G(0.5) and Y2 ~ G (1,2) 
are four independent random variables. Let (Z1 = X1 + Yı, Z2 = Xə - Y2) be the random 
vector, whose dependence function is the t copula with parameters v = 2 and p = —70%. It 
is obvious that it is not possible to find an analytical expression of the marginal distributions 
of Zı and Z2. However, the random variables Zı and Z are easy to simulate (Figure 13.10). 
This is why we can use the method of the empirical quantile function to simulate the random 
vector (Z1, Z2). A sample of 4000 simulated values of the vector (Z1, Z2) is reported in 
Figure 13.11. 
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FIGURE 13.9: Convergence of the method of the empirical quantile function 


13.1.4 Generating random matrices 


The simulation of random matrices is a specialized topic, which is generally not covered 
by textbooks. However, the tools presented in this section are very useful in finance. This 
is particularly true when we would like to measure the correlation risk. 


13.1.4.1 Orthogonal and covariance matrices 


An orthogonal matrix Q is a square n x n matrix, whose columns and rows are orthonor- 
mal vectors: 


Q'Q=QQ' =h 
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FIGURE 13.10: Simulation of the random variables Z and Zə 


20 


Z2 


FIGURE 13.11: Simulation of the random vector (Z1, Z2) 
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It follows that Q7! = Q. Generally, Monte Carlo methods require generation of random 
orthogonal matrices distributed according to the Haar measure®. Anderson et al. (1987) 
proposed two simple algorithms to generate Q: 


1. Let X be an x n matrix of independent standard Gaussian random variables. Q is 
the unitary matrix of the QR factorization of X = QR where R is an upper triangular 
factorization. 


2. Let X be a n x p matrix of independent standard Gaussian random variables with 
p > n. Q corresponds to the matrix V of the eigendecomposition X! X = VAV! or 
the matrix U of the singular value decomposition X! X = UDV*. 


Stewart (1980) proposed another popular algorithm based on the Household transformation. 
Let H, be the symmetric orthogonal matrix defined as: 


Hx = ||| “ey 


We consider a series of independent Gaussian random vectors: xı ~ Nn (0, In), £2 ~ 
Nn—1(0,In—1), etc. We form the matrix Hp = diag (I,_1, Hz,). The random orthogonal 
matrix Q is then generated by the product: 


where D is the diagonal matrix with entries +1. To illustrate this algorithm, we simulate 
random orthogonal matrices Q for different values of n, and we report the distribution of 
the eigenvalues of Q in Figure 13.12. We verify that they are almost uniformly distributed 
on the unit sphere. 


Remark 156 To simulate a random covariance matrix X with specified eigenvalues 
Ai,---;An, we generate a random orthogonal matrix Q and consider the transformation: 


x= QAQ" 
where A = diag (A1,.-.,An)- 


13.1.4.2 Correlation matrices 


A correlation matrix C is a symmetric positive definite matrix, whose diagonal elements 
are equal to 1. It follows that the sum of the eigenvalues is exactly equal to n. The previous 
algorithm can be used to simulate a random correlation matrix. Indeed, we only need to 


transform © into C: S 
Ci; = UJ 
Diit yj 
However, this method is not always interesting, because it does not preserve the specified 
eigenvalues A1,...,An- Let us consider an example with A; = 0.5, Ag = 1.00 and A3 = 1. A 
simulation of X gives: 


1.28570 —0.12868 0.37952 
“= à| —0.12868 0.89418 0.16377 
0.37952 0.16377 0.82012 


8 Any column or any row of Q has a uniform distribution over the n-dimensional unit sphere. 
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Bn-= 10 
© n = 20 
+n=50 


FIGURE 13.12: Distribution of the eigenvalues of simulated random orthogonal matrices 


We deduce the following random correlation matrix: 


1.00000 —0.12001 0.36959 
C= | —0.12001 1.00000 0.19124 
0.36959 0.19124 1.00000 


If we calculate the eigenvalues of C, we obtain A, = 1.378, Ag = 1.095 and A3 = 0.527. 
The problem comes from the fact that QAQ! generates a covariance matrix with specified 
eigenvalues, but never a correlation matrix even if the sum of the eigenvalues is equal to n. 

Bendel and Mickey (1978) proposed an algorithm to transform the matrix © into a 
correlation matrix C with specified eigenvalues A1,..., An. The main idea is to perform 
Givens rotations’. Let Ge,s (i, j) be the Givens matrix: 


E tes O -- 0 0 
0 Cc s 0 
Gog (i, j) = 
0 —s c 0 
0) es O -- QO -- 1 


such that the (i, j) element" of Ge s (i, 7) ' UGe.s (i,j) is equal to 1. By performing n succes- 
sive Givens transformations © + Ge, (i, i)! UG_,s (i,j), we obtain a correlation matrix C 


°A Givens rotation is a rotation in the plane spanned by two coordinates axes (Golub and Van Loan, 
2013). Because Givens matrices are orthogonal, eigenvalues are not changed. 
10 We have i < j and Diy <1 < Ezy (or Dig > 1 > Ejj). 


with eigenvalues \1,.. 
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., An. The previous algorithm has been extensively studied by Davies 


and Higham (2000), who showed that: 


where: 


1 
c= ands=c:-t 
v1+t? 
i X; j + yZ (Diz 1) (5 Jj 1) 
7 (24,5 — 1) 


To show the difference between the Bendel-Mickey algorithm and the previous covariance 
algorithm, we simulate a correlation matrix of dimension 20 with specified eigenvalues and 
the two algorithms. In Figure 13.13, we compare the eigenvalues calculated with the simu- 
lated correlation matrices and compare them with the specified eigenvalues. We verify that 
the Bendel-Mickey algorithm preserves the spectrum, which is not the case of the covariance 


algorithm!. 
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FIGURE 13.13: Comparison of the Bendel-Mickey and covariance algorithms 


We study the correlation risk of the basket option, whose payoff is equal to: 


G 


where S; (T) is 


= (S1 (T) — S2 (T) + S3 (T) — S4 (T) — K), -1 {55 (T) > L} 


the price of the it asset at the maturity T. We assume that the dynamics 


of the asset prices follows a Black-Scholes model: 


1 
pone 


s:(T) 3 


5,(0)-exp (( 


aè) T + c; (W; (T) — W; 0) 


1l However, we notice that the eigenvalues are close. 
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where r is the risk-free rate, g; is the asset volatility and W; is a Brownian process. We also 
assume that the Brownian processes are correlated: 


E [W; (t) W; (t)] = pig t 


To calculate the price of the basket option, we simulate the terminal value of S; (T) and 
average the simulated payoff Gs: 


P=E [e TG] N = 5 Ene 


where ng is the number of simulations. We use the following values: S; (0) = 100, r = 5%, 
oi = 20%, T = 0.25, K = 5 and L = 105. We consider that it is difficult to estimate the 
correlation matrix and assume that it is unstable. In this case, we have to find an upper 
bound for P in order to take into account this correlation risk. Generally, we price the 
option by using a constant correlation matrix Cs (p) and takes the supremum: 


P* = sup P (Cs (p)) 


Constant correlation matrix 
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FIGURE 13.14: Price of the basket option 


In the top panel in Figure 13.14, we report the price of the basket option with respect to 
the uniform correlation p. We notice that the price is a decreasing function of p and reaches 
its maximum when the uniform correlation is —25%. Therefore, we could suppose that 
the upper bound is equal to $2.20. However, if we consider random correlation matrices, we 
observe that this price is not conservative (see bottom panels in Figure 13.14). For instance, 
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we obtain a price equal to $5.45 with the following correlation matrix: 


1.0000 —0.4682 —0.3034 —0.1774 0.1602 

—0.4682 1.0000 —0.3297 0.1381 —0.7272 

C=] —0.3034 —0.3297 1.0000 —0.3273 0.6106 
—0.1774 0.1381 —0.3273 1.0000 —0.1442 

0.1602 —0.7272 0.6106 —0.1442 1.0000 


| 


This matrix indicates the type of correlation risks we face when we want to hedge this basket 
option. Indeed, the correlation risk is maximum when the fifth asset (which activates the 
barrier) is positively correlated with the first and third assets and negatively correlated with 
the second and fourth assets. 


In the Bendel-Mickey algorithm, we control the structure of the random correlation 
matrix by specifying the eigenvalues. In Finance, it can be not sufficient. For instance, we 
may want to simulate the matrix C such that is expected value is equal to a given correlation 
matrix C*: 


5 [C] = C* 


Let A be a random symmetric matrix with zeros on the diagonal and mean E[A] = 0. 
Marsaglia and Olkin (1984) showed that C = A+ C* is a random correlation matrix with 
z [A + C*] = C* if the 2-norm of A is less than the smallest eigenvalue Amin of C*. There 
is a variety of algorithms that uses this result. For instance, Marsaglia and Olkin (1984) 
proposed to generate a random correlation matrix R with specified eigenvalues in the interval 
1 — Amin, L + Amin] and to take C = (R — In) + C*. 


13.1.4.3 Wishart matrices 


To generate a random Wishart matrix S, we simulate n independent Gaussian random 
vectors X; ~ Np (0, £) and form the n x p matrix X by concatenating the random vectors 
in the following way: 


Xı 


Then, we have S = X' X. The simulation of an inverse Wishart matrix T is straightforward 
by applying the transformation method: 


T = 57! 


13.2 Simulation of stochastic processes 


We distinguish two types of time series models, those based on discrete-time stochastic 
processes and those based on continuous-time stochastic processes. Discrete-time models 
are easier to simulate, in particular when we consider time-homogeneous Markov processes. 
This is not the case of continuous-time models, which are generally approximated by a time- 
discretized process. In this case, the convergence of the discrete simulation to the continuous 
solution depends on the approximation scheme. 
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13.2.1 Discrete-time stochastic processes 
13.2.1.1 Correlated Markov chains 


We consider a vector R = (Rı,..., Rm) of time-homogeneous Markov chains, whose 
transition probability matrix is P. The simulation of the Markov chain m is given by the 
following algorithm: 


1. we assume the initial position of the Markov chain: 


Rm (0) = ig 


2. let u be a random number; we simulate the new position of Am by inverting the 
conditional probability distribution, whose elements are: 


Pr (Rm (n F 1) = tnt | Rm (n) = in) Pining = OL Peis. 


we have: 
ksi k 
ingd = 4k: SS Bind <us< XO pin 
j=1 j=1 


3. we go back to step 2. 


We now assume that the dependence between the Markov chains (%R1,..., Rm) is given 
by a copula function C, implying that the Markov chains are correlated. The algorithm 
becomes: 


1. we assume the initial position of the Markov chains: 
Rm (0) = im,0 
2. let (w1,..., um) be a vector of correlated uniform random numbers such that: 
(u1,... um) ~C 


3. for each Markov chain m, we simulate the new position of Am by inverting the con- 
ditional probability distribution; we have: 


k—1 k 
im n+i — k : X Pimjnd < Um < X Pim,n,j 
j=1 j=1 


and Rm (n + 1) = im n+1- 
4. we go back to step 2. 


We consider four corporate firms, whose initial credit rating is AAA, BBB, B and CCC. 
We assume that the rating of each company is a Markov chain Apm described by the credit 
migration matrix given on page 208. We also assume that the dependence of the credit 
ratings (R1, R2, R3, R4) is a Normal copula with the following matrix of parameters: 


1.00 
_ | 0.25 1.00 
P1= | 0.75 0.50 1.00 


0.50 0.25 0.75 1.00 
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In Figures 13.15 and 13.17, we report 10 simulated paths of the ratings for the next 30 years. 
We verify that the default rating is an absorbing state. Suppose now that the parameter 
matrix of the Normal copula is equal to: 


1.00 
= | 0.25 1.00 
P2=) _9.75 0.50 1.00 


—0.50 0.25 0.75 1.00 


Using this correlation matrix p2 instead of the previous matrix pı, we obtain the results 
given in Figures 13.16 and 13.18. If we compare Figures 13.15 and 13.16 (or Figures 13.17 
and 13.18), which are based on the same uniform random numbers, we notice that the 
simulated paths are not the same. The reason comes from the negative correlation between 
the credit rating of the first company and the other credit ratings. 
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FIGURE 13.15: Simulation of rating dynamics (correlation matrix p1) 


13.2.1.2 Time series 


A state space model (SSM) is defined by a measurement equation and a transition equa- 
tion. The measurement equation describes the relationship between the observed variables 
yz and the state vector ay: 

Ye = Zaz + di + Et 


whereas the transition equation gives the dynamics of the state variables: 


ay = Tiat- + ce + Rint 


The dimension of the vectors y, and ay is respectively n x 1 and mx 1. Z isanxm 
matrix, d; is a n x 1 vector, T; is am x m matrix, c is a m x 1 vector and R; is a m x p 
matrix. €+ ~ Nn (0, H+) and m ~ Np (0, Q+) are two independent white noise processes. By 
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FIGURE 13.16: Simulation of rating dynamics (correlation matrix p2) 
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FIGURE 13.17: Simulation of rating dynamics (correlation matrix p1) 
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FIGURE 13.18: Simulation of rating dynamics (correlation matrix p2) 


construction, there is no special issue to simulate the Markov process a; if we assume that 
the initial position is ag ~ Nm (ao, Po). Indeed, we obtain the following algorithm: 


1. we simulate the initial position: 


ago ~ Nm (ao, Po) 


2. we simulate the position of the state variable at time t: 


ar ~ Nin (Tiari + cr, RQR ) 


3. we simulate the space variable at time t: 


Yt ~ Nn (Ziar + di, Hi) 


4. we go back to step 2. 


Most of discrete-time stochastic processes are homogeneous, meaning that the parame- 
ters of the state space model are time-independent: 


Yt = Za,+d+e 
Qt = Tari +c+ Rn: 


where e+ ~ Nn (0, H) and ne ~ N,(0,Q). In this case, the stationary solution of the 


transition equation is a* ~ Mm (a*, P*) where a* is equal to (Im —T)~' c and P* satisfies 


the matrix equation”: 


P* =TP*T' + RQR! 


12We also have (Harvey, 1990): 
vec (P*) = (Im2 -T Q T)~* vec (RQR") 
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In practice, we generally use the stationary solution to initialize the state space model: 
ao ~ Nm (a*, P*). 

State space models can be used to simulate structural models, AR and MA processes, 


vector error correction models, VAR processes, etc. For instance, a VARMA(p,q) model 
with K endogenous variables is defined by: 


P q 
( -5 i) Y = ( -5 ou! ut 
i=l i=l 


where uz is a multidimensional white noise process. Let a; be the vector process 
(Yt, +++ 5 Yt—p+1, Ut, - - - , Ut—q+1), Whose dimension is K (p + q). We have: 


Qt = Tari + Rut 


where R is the K (p+ q) x K matrix | Ix 0 >- 0 Ig 0 -> 0 ]' and T is the 
K (p+ q) x K (p+ q) matrix: 
i --- pı p Or Oy Q 
Ig (0) (0) (0) 
. : i 
T= (0) Ik 0 0 
0 0 0 0 0 0 
0 Ix 0 0 
0 
0 0 Ik 0 
We notice that: 
Yt = Zat 
where Z is the K x K (p+ q) matrix [ Ik O -::- O |. We finally obtain the following 


SSM representation’: 

Yt = Zar 

at = Tari + Rut 
13.2.2 Univariate continuous-time processes 


13.2.2.1 Brownian motion 


A Brownian motion (or a Wiener process) is a stochastic process W (t), whose increments 
are stationary and independent: 


W (t) — W (s) ~ N (0,t—5) 


Therefore, we have: 

W (0) =0 

W (t) = W (s) + €(s,t) 
where e(s,t) ~ N (0,t— s) are iid random variables. This representation is helpful to 
simulate W (t) at different dates t1, t2, . . . If we note Wm the numerical realization of W (tm), 


we have: 
Wm+1 = Wm F vV tm+1 — tm'Em 


where €m ~ N (0,1) are iid random variables. In the case of fixed-interval times tm41—tm = 
h, we obtain the recursion: 
Wm+1 = Wm + Vh- Em 


13We have H = 0, Q = var (ut), d = 0 and c = 0. 
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13.2.2.2 Geometric Brownian motion 


The geometric Brownian motion is described by the following stochastic differential 
equation: 


X (0) = To 


To * exp ((u -= 57°) t+ow 0) 
g (W (t)) 


Therefore, simulating the geometric Brownian motion X (t) can be done by applying the 
transform method to the process W (t). 


{ dX (t) = uX (t) dt + oX (t) dW (t) 


Its solution is given by: 


X (t) 


Another approach to simulate X (t) consists in using the following formula: 


We have: 
1 
Xm+1 = Xm “exp ((u = 50°) (tm+1 = tm) H Oy tm+1 = tm Á em) 


where Xm = X (tm) and €m ~ N (0,1) are iid random variables. If we consider fixed-interval 
times, the numerical realization becomes: 


1 
Xm+1 =Xnm “exp ((u- 57°) havh: em) (13.1) 


Example 145 In Figure 13.19, we simulate 10 paths of the geometric Brownian motion 
when u and o are equal to 10% and 20%. We consider a period of one year with a financial 
calendar of 260 trading days. This means that we use a fixed-interval time with h = 1/260. 
In finance, we use the convention that t = 1 corresponds to one year, which implies that u 
and o are respectively the annual expected return and volatility. 


13.2.2.3 Ornstein-Uhlenbeck process 


The stochastic differential equation of the Ornstein-Uhlenbeck process is: 
dX (t) =a(b— X (t)) dt+ adW (t) 
X (0) = To 
We can show that the solution of the SDE is: 
t 
X (t) = ape" +b (1 — e7%) + o f e20- dW (0) 
0 
We also have: 
t 
X (t) = X eet +5 (1 = ey +o J e- AW (0) 


where iK e*(°—9) AW (6) is a Gaussian white noise process with variance (1 — e~?4()) / (2a). 
If we consider fixed-interval times, we obtain the following simulation scheme: 
= e~ 2ah 


Kaa = Xm” 4 b(1 ee) Ho T Er 


where €m ~ N (0,1) are iid random variables. 
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FIGURE 13.19: Simulation of the geometric Brownian motion 


Example 146 We assume that a = 2, b = 10% and o = 1.5%. The initial position of the 
process is xo = 5%. We simulate X, for two years and report the generated paths in Figure 
13.20. 


13.2.2.4 Stochastic differential equations without an explicit solution 


In the case of the geometric Brownian motion or the Ornstein-Uhlenbeck process, we ob- 
tain an exact scheme for simulating these processes, because we know the analytical solution. 
In many cases, the solution is not known and can only be simulated using approximation 
schemes. Let X (t) be the solution of the following SDE: 


{ dX (t) = p(t, X) dt +a (t, X) dW (t) 
X (0) = To 


The simplest numerical method for simulating X (t) is the Euler-Maruyama scheme, which 
uses the following approximation: 


X (t) — X (s) ~ u (t, X (s)) -(t— 8) + a (t, X (s)) - (W(t) — W (s)) 
If we consider fixed-interval times, the Euler-Maruyama scheme becomes: 
Xm = Xm + H (tm, Xm) h +0 (tm, Xm) Vh- Em 
where €m ~ N (0,1) are iid random variables. 


Remark 157 The accuracy of numerical approximations is evaluated with the strong order 
of convergence. Let X be the numerical solution of X (tm) computed with the constant 
stepwise h. A numerical scheme is said to converge strongly to the exact solution if we have: 


pe [xx] =0 
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FIGURE 13.20: Simulation of the Ornstein-Uhlenbeck process 


for a time tm. The order of convergence is given by the convergence rate p: 


E [|x -xX (tm)|| <O-hP 
where C is a constant and h is sufficiently small (h < ho). In the case of the Euler- 


Maruyama method, the strong order of convergence is 0.5. 


Example 147 For modeling short-term interest rates, Chan et al. (1992) consider the fol- 
lowing SDE: 
dX (t) = (a+ BX (t)) dt + oX (t)” dW (t) (13.2) 


We deduce that the fixed-interval Euler-Maruyama scheme is: 
Xm41 = Xm + (a + bX) h + oX}, Vh- Em 


Kloeden and Platen (1992) provided many other approximation schemes, based on Itô- 
Taylor expansions of the SDE. For instance, the fixed-interval Milstein scheme is: 


Xo = Xm+u(tm,Xm)h+o (tm, Xm) Vh: Em + 
1 


The strong order for the Milstein method is equal to 1, which is better than the Euler- 
Maruyama method. In terms of implementation, these two approximation schemes remain 
simple, compared to other Taylor schemes that converge more quickly, but generally use 
correlated random variables and high order derivatives of the functions p (t, x) and ø (t, £). 
This is why Euler-Maruyama and Milstein schemes are the most frequent methods used in 
practice’. 


14For instance, one of the most famous methods is the strong order 1.5 Taylor scheme proposed by Platen 
and Wagner (1982). It requires the second derivatives 02 (t, x) and Zo (t,x), and an additional random 
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If we consider the geometric Brownian motion, the Euler-Maruyama scheme is: 


Xm+1 = Xm + UXmh + oXmVvh "Em 


whereas the Milstein scheme is: 


Xm-+1 = 


Xm 4 


— 1 
F UXmħh + OX Al he bey + 57 Xmh (e3, — 1) 


Xm 


1 1 
+ (u — 507) Xmh+oXmVh (1 + 5oV Fem) Em 


It follows that the Milstein scheme operates two corrections for simulating the GBM process: 


e the first correction concerns the drift, which is now correct; 


e the second correction applies to the diffusion term, which increases if it is positive and 
decreases if it is negative. 


In order to illustrate the differences between these two schemes, we compare them using the 
same random numbers. A simulation is provided in Figure 13.21 in the case where u = 10% 
and o = 50%. With a monthly discretization, we notice that the Milstein scheme produces 
a better solution than the Euler-Maruyama scheme. 


6000 


5000 


4000 


3000 


2000 


1000 


Exact scheme 
— — Euler—Maruyama 
---—: Milstein 


10 20 30 40 50 


FIGURE 13.21: Comparison of exact, Euler-Maruyama and Milstein schemes (monthly 


discretization) 


When we don’t know the analytical solution of X (t), it is natural to simulate the 
numerical solution of X (t) using Euler-Maruyama and Milstein schemes. However, it may 


variable correlated with the increments of the Brownian motion. Even if this scheme is interesting to study 
from a theoretical point of view, it is never used by practitioners because it is time-consuming. 


Monte Carlo Simulation Methods 823 
be sometimes more efficient to find the numerical solution of Y (t) = f (t, X (t)) instead of 
X (t) itself, in particular when Y (t) is more regular than X (t). By Itô’s lemma, we have: 

dY (t) = (ar .X) + W(X) Of (4X) + lo? (x) af (t,X)) dt + 
a (t, X) Oxf (t, X) dW C) 
By using the inverse function X (t) = f~t (t, Y (t)), we obtain: 
dY (t) = p' (t, Y) dt+o' (t, Y) dW (t) 


where py’ (t,Y) and o'(t,Y) are functions of w(t, X), o (t, X) and f(t, X). We can then 
simulate the solution of Y (t) using an approximation scheme and deduce the numerical 
solution of X (t) by applying the transformation method: 


Let us consider the geometric Brownian motion X (t). The solution of Y (t) = In X (t) 
is equal to: 


dY y= (u- 50°) dt + o AW (t) 


We deduce that the Euler-Maruyama (or Milstein!) scheme with fixed-interval times is: 
l 2 
Ymi = Ym + ea h+oavnh-: Em 


It follows that: 
1 2 
InXm41=InXm + |u- 30 h+ovVh- Em (13.4) 


We conclude that this numerical solution is the exact solution (13.1) of the geometric 
Brownian motion. 


The previous application is not interesting, because we know the analytical solution. The 
approach is more adapted for stochastic differential equations without explicit solutions, for 
example the Cox-Ingersoll-Ross process: 


dX (t) = (a + BX (t)) dt + 0oy X (t) dW (t (13.5) 


This process is a special case of the CKLS process (13.2) with y = 1/2 and can be viewed as 
an Ornstein-Uhlenbeck process!® with a reflection at X (t) = 0. Using the transformation 
Y (t) = \/X (t), we obtain the following SDE": 


= fl(at+PX(t)) 1PX(t 1oyX (t) 
dY (t) = (3 XW s20) nyo W (t) 


a+ BY? (t)— 77°) dt + soa (t) 


wal 


15 Because Oya’ (t, Y) = 0. 
16The drift can be written as a+ BX (t) = —8 ( aB-+— X(t )). We deduce that a = —8 and b = —a 87t. 


17We have Oxf (t,x) = 1y-1/2 and O2 f t,x = —fa3/?, 
2 
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We deduce that the Euler-Maruyama scheme of Y (t) is: 


1 1 1 
Yuu = Yn tr YZ — -0° | h+ =0vVh-: Em 
yi +a (a+ orè 77°) + 50Vh-e 


It follows that: 
1 ie 1 a 
Xm = | V Xm at BXm 17 h4 ral h-Em 


We can show that this approximation is better than the Euler-Maruyama or Milstein ap- 
proximation directly applied to the SDE (13.5). 


Remark 158 Generally, we choose the Lamperti transform Y (t) = f (X (t)) in order to 
obtain a constant diffusion term (O,o' (t,y) = 0). This implies that: 


f(e)=e f au 


Because we have 0, f (t,x) = c/o (t,x) and 0? f (t,x) = —côz0 (t,x) /o? (t,x), we obtain: 


ay (y= 0( AES) 2700) dt + cdW (t) 


In this case, the Euler-Maruyama scheme coincides with the Milstein scheme. Most of the 
time, the approximation Xm = f~'(Ym) gives better results than those obtained with a 
Milstein method applied to the process X (t). 


13.2.2.5 Poisson processes 


We have seen that simulating a Poisson process N (t) with constant intensity À is 
straightforward, because the inter-arrival times are independent and exponentially dis- 
tributed with parameter À. Let tm be the time when the m*® event occurs. The numerical 
algorithm is then: 


1. we set to = 0 and N (to) = 0; 


2. we generate a uniform random variate u and calculate the random variate e ~ E (A) 
with the formula: 


3. we update the Poisson process with: 


tm+1 + tm +e and N (tm+1) N (tm) +1 


4. we go back to step 2. 


We can also use this algorithm to simulate mixed Poisson process (MPP), which are de- 
fined as Poisson process with stochastic intensity A. In this case, the algorithm is initialized 
with a realization A of the random intensity A. On the contrary, this method is not valid 
in the case of non-homogenous Poisson process (NHPP), where the intensity A(t) varies 
with time!®. However, we can show that the inter-arrival times remain independent and 
exponentially distributed with: 


Pr{T, > t} = exp(—A(t)) 


18Indeed, we don’t know the value of the intensity to use at the second step of the algorithm. 
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where T; is the duration of the first event and A(t) is the integrated intensity function: 


A (t) =f à (s) ds 
It follows that: 
Pr {T, > Aq! (t)} = exp (~t) & Pr {A (T1) > t} = exp (—t) 


We deduce that if {t1,t2,...,tm} are the occurrence times of the NHPP of intensity A (t), 
then {A (t1), A (t2),..., A (tm)} are the occurrence times of the homogeneous Poisson pro- 
cess (HPP) of intensity one. Therefore, the algorithm is: 


1. we simulate t/,, the time arrivals of the homogeneous Poisson process with intensity 
à= l; 


2. we apply the transform tm = AT! (tl). 


To implement this algorithm, we need to compute the inverse function A~! (t). When there 
is no analytical expression, this algorithm may be time-consuming, in particular when A (t) 
is calculated with a method of numerical integration. Another approach consists in using 
the acceptance-rejection algorithm for simulating the NHPP over the period [0, T]: 


1. we set AT = max;<r A(t), t = 0, to = 0 and N (to) = 0; 


2. we generate a uniform random variate u and calculate the random variate e ~ E (AT) 
with the formula: 


3. we calculate t = t + e; 
4. if t > T, we stop the algorithm; 


5. we generate a uniform random variable v; if v < A (t) /At, then we accept the arrival 
time: 
tm+1 +t and N (tm+1) N (tm) +1 


else we reject it; 
6. we go back to step 2. 


In Figure 13.22, we simulate a non-homogenous Poisson process with the following in- 
tensity function: 


6 
A (t) = 90 + 80- sin (= +t) 
Since àA (t) is a cyclical function and A (t) € [10,170], the intensity function can vary very 


quickly. In the bottom/right panel, we draw the histogram of arrival process for the interval 
[t,t + dt] and compare it with the expected arrival frequency, which is equal to: 


t+dt 
ZIN (t+ dt) — N (t)] =| Medes E. 


The compound Poisson process Y (t) is defined by: 


N(t) 
Yo= 5-2, 
{=l 
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Simulated Poisson process 


1000 


Observed arrival frequency Expected arrival frequency 
(dt = 1/10) (dt = 1/10) 
25 


20 


FIGURE 13.22: Simulation of a non-homogenous Poisson process with cyclical intensity 


where N (t) is a Poisson process of intensity \ and {X;},., is a sequence of iid random 
variables with distribution function F. This process is a generalization of the Poisson process 
by assuming that jump sizes are not equal to one, but are random. Let {t1,t2,..., tm} be 
the arrival times of the Poisson process. We have: 


Y(t)=Y(tm) if t € [tmetmaal 


and: 
Y (tm4i) = Y (tm) + Xm41 


where Xm+41 is generated from the distribution function F. Another method to simu- 
late Y (t) is to use the following property: conditionally to N (T) = n, the arrival times 
{t1, t2,...,tn} of the Poisson process on the interval [0,7] are distributed as n independent 
ordered uniform random variables. We deduce this algorithm: 


1. we simulate the number n of jumps on the time interval [0, T] by generating a Poisson 
random variable with parameter AT; 


2. we simulate n uniform random variates (u1,...,Un) and sort them!’: 


U1:n < U2:n < tee < Un:n 


3. we simulate n random variates (£1,..., £n) from the probability distribution F; 


4. we finally generate the compound Poisson process by: 


19The arrival times are given by the formula: tm = T - um:n. 
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13.2.2.6 Jump-diffusion processes 
A jump-diffusion model is a process, which is generally defined by: 
dX (t) = u (t, X) dt + o (t, X) AW (t) +n (t, X7) dJ (t) (13.6) 


where J (t) is a jump process. Between two jumps, dJ (t) is equal to zero and the process 
X (t) is continuous and evolves according to the SDE: 


dx (t) = w(t, X) dt+ o (t, X) dW (t) 
When a jump occurs, the process is discontinuous and we have: 


X() =X (tH) +n (t-,X (t )) dJ (t) 


The jump process may be a Poisson process N (t) with intensity À or a compound Pois- 
son process Y (t) = psy Zi where {Z;},., is a sequence of iid random variables with 
distribution F. E 


In the case J (t) = N (t), the Euler scheme is: 


X (tm+1) = X (tm) + (ms X (tm))- (tm+1 = tm) + 


a (tm, X (tm)) + (W (tm+1) — W (tm)) + 
n (tm, X (tm)) > (N (tm+1) — N (tm)) 
We finally obtain: 
Xm+1 = Xm + U(tm, Xm) : (tm+1 — tm) + 
o (tm, Xm) V tm+1 — tm'Em + n (tm, Xm) s Em (13.7) 
where €m ~ N (0,1) and Em ~ P (À (tm+1 — tm)). We have: 
Pelasi = g 
Pr {fm =1} = eH (a — tm) 
Pr{éêm > 2} = 1- etmi tm) (14 (tina — tm)) 


If we assume that the stepsize tm+1 —tm is small, we obtain Pr {Em = 0} ~ 1—A (tm+1 — tm), 
Pr {Em = 1} © A(tm4i — tm) and Pr {Em > 2} ~ 0. Therefore, we can generate Em by: 


e S 1 IEA (imti — tm) Sum 
m 0 otherwise 


where upm is a uniform random variate. Another way to simulate X (t) is to first simulate the 
arrival times of the Poisson process. We denote these times by T1, 72,...,7, and combine 
this grid with the initial grid tı, t2,...,tm. We then apply the Euler scheme (13.7) on the 
augmented grid, but we are now sure that we cannot have more than one jump between 
two discretization times. We illustrate this algorithm by considering the SDE: 


dX (t) = 0.15 - X (t) dt + 0.20 - X (t) dW (t) + (30 — 0.30 - X (t7) ) -dJ (t) 


A simulated path is given in Figure 13.23, where the jumps are indicated by a dashed line. 


In the case of the compound Poisson process J (t) = Y (t), we can obtain explicit so- 
lutions for some processes. For instance, the model of Merton (1976) considers that the 
continuous part is a geometric Brownian motion: 


dX (t) = uX (t) dt + 0X (t) AW (t) + X (t) dJ (t) 
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0.0 0.5 1.0 1.5 2.5 


FIGURE 13.23: Simulation of a jump-diffusion process 


If we assume that the it jump occurs at time t, we obtain??: 


X(t) = X(t )4+ X(t) Z; 
(1+ Z) X (E) 


We deduce that: 


X(t) = X(0)exp ((u- 50°) t+oW (t)+ 10) 


N(t) 
= X(0)e43")4oW TT (14+ Z,) 
i=1 
In the general case, the Euler scheme is: 
Xm+1 = Xmty (tm, Xm) i (tm+1 = tm) a 


Oo (tm, Xm) V tm+1 = tm "Em + n (tm, Xm) : Em 


where €m ~ N (0,1) and Em = Y (tm+4i) — Y (tm). As we have previously presented an 


algorithm to generate Y (t), there is no difficulty to simulate X (t). 


13.2.2.7 Processes related to Brownian motion 


We have previously shown how to simulate a stochastic differential equation by assuming 
the initial position of the random process. In finance, we also need to simulate stochastic 
statistics of the 


processes with other constraints (Brownian bridge, Brownian meander) or 
SDE (minimum, maximum, stopping time). 


20We assume that Z; > —1. 


5.0 
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Brownian motion W(t) Brownian motion W’(t) 


Reversed Brownian motion W’(1-t) Brownian bridge B,(t) 
2.0 


FIGURE 13.24: Simulation of the Brownian bridge Bj (t) using the time reversibility 
property 


A Brownian bridge B, (t) is a Brownian motion W (t) such that W (0) = 0 and W (1) =r 
(Revuz and Yor, 1999). For t € [0,1], we have?!: 
B, (t) = W(t) + (r-W(d)-t 
Devroye (2010) noticed that: 
W (1) = W(t) + (W (1) — W (¢)) 
£ 


The time reversibility property of the Brownian motion implies that W (1) — W (t) 
W (1 — t). It follows that: 


B, (t) = W(tH)+(r-(W (+W (A-t) t 
= r-t+(1-t)-W(t)+t-W'(1-2t) 
Figure 13.24 illustrates the simulation of B, (t) by using two simulated paths W (t) and 
W’ (t). We also notice that: 
B,(t) = r-t+(1-t)-vt-e1+t- VI-t- e2 
= r-t+t-Jt(1—t)-e 


where €1, €2 and € are standard Gaussian random variables. If we now assume that s < t < u, 
W (s) = ws, and W (u) = wy, the Brownian bridge becomes: 


aga et OO ae, EO 


uU— Ss U— Ss 


21We verify that the increments of B, (t) are independent, Br (0) = 0 and B, (1) =r. 
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because of the scaling property of the Brownian motion??. If we consider the simulation 
of B (t) for different values tm € [s, u], we proceed by filling the path with the iterative 
algorithm: 


1. we initialize the algorithm with m = 1; 

2. we simulate the Brownian bridge B (tm) such that B (s) = ws and B (u) = wu; 
3. we set s = tm and B (s) = B (tm); 

4. we go back to step 2. 


In Figure 13.25, we report 5 simulations of the Brownian Bridge B (t) such that B (0) = 0, 
B(1) =1, B(3) =3 and B (5) = 2. 


FIGURE 13.25: Simulation of the Brownian bridge B (t) 


To simulate a process X (t) with fixed values at times 7),...,7), we assume that we 
have an explicit solution X (t) = g (W (t)) implying that W (t) = g7! (X (t)). Simulating 
a diffusion bridge X (t) consists then in generating the Brownian bridge B (t) such that 
W (7;) = g7! (X (%)), and applying the transformation X (t) = g (B (t)). For instance, if 
we consider the geometric Brownian motion, we have: 


X (t) = g (W (t)) = £o - exp ((u- 57°) trow() 


and: 
_ ln X (t) — ln zo — (u-—ł0°)t 


W (H) =9 1 (X (6) - 


?2See also Exercise 13.4.8 on page 891 for an alternative proof (Glasserman, 2003). 
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We assume that v = 100, u = 0 and o = 10%. The fixed values of X (t) are given in 
the table below. Using the previous formula, we deduce the values taken by the Brownian 
bridge: 


Tj X(t) Wm) 

0 100 0.0000 
1 110 1.0031 
3 100 0.1500 
5 90 —0.8036 


We have reported five simulated path of this diffusion bridge in Figure 13.26. 


FIGURE 13.26: Simulation of the diffusion bridge X (t) 


Diffusion bridges are important in finance when we would like to study extremes of a 
diffusion process. If we want to find the maximum of the stochastic process X (t) over [0, T], 
we can simulate X (t) and take the maximum of the generated path: 


M = max Xm 
m 
Another approach consists in locating the maximum: 
m* = arg max Xm 
m 


and simulating the diffusion bridge B (t) such that X (tm+—1) = Xm«-1, X (tm+) = Xm» 
and X (tm++1) = Xm*+1. In this case, we can define another estimator of the maximum: 


M = max B (t) 


t€[tm* —1tm* 41] 


By construction, we always have M > M. For instance, we report the probability density 
function of M and M in Figure 13.27 when we consider the geometric Brownian motion 
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with xo = 100, u = 0, o = 15% and T = 1. The GBM process has been simulated with 
a fixed stepsize h = 0.1, whereas the diffusion bridge has been simulated with h = 0.001. 
This implies that each path uses 1/0.1 = 10 discretization points in the first case and 
10 + 0.2/0.001 = 210 discretization points in the second case. The estimation based on the 
diffusion bridge is then equivalent to consider a scheme with 1/0.001 = 1000 discretization 
points. 


0.05 F 


0.04 F \ Exact scheme 


= — Diffusion bridge 


80 90 100 110 120 130 140 150 160 
max X(t) 


FIGURE 13.27: Density of the maximum estimators M and M 


Remark 159 In the case of the geometric Brownian motion X (t), the distribution function 
of the maximum is known. Indeed, we have? : 


Pr{M (t) > £} = exp =) 2 (=) ia r) 


where M (t) is the maximum of a Brownian motion with a constant drift: 


M (t) = max nt + oW (t) 


We notice that: 


?3In the case of the minimum, we can use the following identity: 


m (t) = min nt + oW (t) = — max —nt — oW (t) 
s<t s<t 


It follows that: 


Prim <2) = exw (2E) o (247) +o (22) 
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It follows that: 


Pr { max x (s) > o} = Pr {max n x (s) >In o} 
Pr {M (t) > Ina — ln zo} 


Remark 160 Diffusion bridges are extensively used when pricing look-back options by 
Monte Carlo, but also barrier options. Indeed, we need to locate precisely the stopping time 
when the process crosses the barrier. More generally, they may accelerate the convergence 
of Monte Carlo methods in the case of path-dependent derivatives (Glasserman, 2003). 


13.2.3 Multivariate continuous-time processes 
13.2.3.1 Multidimensional Brownian motion 


Let W (t) = (W1 (t),..., Wn (t)), be a n-dimensional Brownian motion. Each component 
W; (t) is a Brownian motion: 


W; (t) — W; (s) ~ N (0,t — 5) 


Moreover, we have: 


2 [W; (t) W; (s)] = min (t, s) - pi,j 


where p; j is the correlation between the two Brownian motions W; and W;. We deduce 
that: 

W (0) =0 

W (t) = W (s) + €(s,t) 


where €(s,t) ~ Nn (0, (t — s) p) are iid random vectors. It follows that the numerical solu- 


tion is: 
Wm+1 = Wm + Vtm+1 — tm: P+ Em 


where P is the Cholesky decomposition of the correlation matrix p and £m ~ Nn (0, I) are 
tid random vectors. In the case of fixed-interval times, the recursion becomes: 


Witt = Wm + Vh- P+ em 


In Figures 13.28 and 13.29, we simulate the realization of two-dimensional Brownian 
motions. Since the two simulated paths use the same random numbers, the difference comes 
from the correlation 1,2, which is equal to zero for the first case and 85% for the second 
case. 


13.2.3.2 Multidimensional geometric Brownian motion 
Let us now consider the multidimensional geometric Brownian motion”*: 


{ dX (t) = wo X (t) dt + diag (o © X (t)) dW (t) 
X (0) = £o 


24The symbol © is the Hadamard product. 
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FIGURE 13.28: Brownian motion in the plane (independent case) 


W2(t) 


0.5 


0.0 


=05 


FIGURE 13.29: Brownian motion in the plane (p1,2 = 85%) 
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where X (t) = (Xi (t),...,Xn(t)), w = (f1,---,Un), © = (01,---,0n) and W(t) = 
(W1 (t),...,Wn (t)) is a n-dimensional Brownian motion with E [w (t) W (t)'| = pt. If 


we consider the jt* component of X (t), we have: 


dX; (t) = u; X; (t) dt + o; X; (t) AW; (t) 


The solution of the multidimensional SDE is a multivariate log-normal process with: 
l o 
X; (t) = X; (0) -exp | | mj — 505 J t+ oW; ©) 


where W (t) ~ Nn (0, pt). We deduce that the exact scheme to simulate the multivariate 
GBM is: 


Xim+1 = Xim -exp ((u1 — $07) (tm41 — tm) + O1V/Em4i — bm * E1,m) 


Xjm+1 = X jm “exp (uy = 597) (tm+1 J tm) F Ojvy tm41 — tm: Ein) 


Xn,m+1 = Xn “exp ((un =; 302) (tm+1 = tm) + Ony tm+1 = tm j Enim) 
where (€1,m,---;€n,m) ~ Nn (0, p). 


Remark 161 Monte Carlo methods extensively use this scheme for calculating the price of 
multi-asset derivatives in the Black-Scholes model. 


13.2.3.3  Euler-Maruyama and Milstein schemes 
We consider the general SDE: 


{ dX (t) = u (t, X (t)) dt + o (t, X (t)) dW (t) 
X (0) = To 


where X (t) and u(t, X (t)) are n x 1 vectors, o (t, X (t)) is a n x p matrix and W (t) is a 
px 1 vector. We assume that E [w (t)W ("| = pt, where p is a p x p correlation matrix. 


The corresponding Euler-Maruyama scheme is: 


Xm+1 -= Xm + H (tm; Xm) š (tm+1 = tm) +o (tm; Xm) V tm+1 —_ tm “Em 


where £m ~ Mp (0, p). In the case of a diagonal system’, we retrieve the one-dimensional 
scheme: 


Xj m1 = Xjm + hj (tm, Xim) + (tm+1 — tm) + 95,5 (tm, Xim) © Ytm+1 — tmEj,m 
However, the random variables £jm and €j',m may be correlated. 


Example 148 We consider the Heston model: 


dX (t) = xX (t oo: t)X (t) dW, (t) 
du (t) =a a Whee: ) dW2 (t) 


25 This means that pj (t, x) = uj (t, vj) and ø (t,x) isa n x n diagonal matrix with oj,; (t, £) = cj j (t, £j). 
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where E[W, (t) Wo (t)] = pt. By applying the fixed-interval Euler-Maruyama scheme to 
(In X (t) ,v()), we obtain: 


1 
ln Xm+1 = ln Xm 4 (u D h+ VUmh: Elm 


ands: 
Um+1 = Um +a (b — Um) h + oV Umh - E2,m 


Here, €1,m and £2m are two standard Gaussian random variables with correlation p. 


The multidimensional version of the Milstein scheme is?”: 


Xj mtl = Xjm + Hj (tm, Xm) (t m+1 —tm +Z ou m, X. m) AWk,m + 


D Poig (tm, X m) Lik,k’) 


where: 


LOF x)= saa pa a 


i Ox 
and: 
tm+41 s 
Tikk) = J i dW, (t) dAWy (s) 
tm tm 
In the case of a diagonal system, the Milstein scheme may be simplified as follows: 


Xj m+1 = Xjm + Hj (tm, Xjm) (tm41 = tm) + Ojj (tm, Xjm) AW jm + 
Eog iA 25,3) 


Crt s 
te = f f dW; (t) dW; (s) 


m 


P JT O3 -Wi Em) AW; (89) 


m 


where? 


1 
= 3 ((AWjm)? — (tm41 — tm)) 
We deduce that the Milstein scheme is: 


Xj m+1 = Xim ate Hj (tm, Xim) (tm+4 = tri) + 
Ojj (tm, Xim) V tm41 = tmEj,m + 


1 
5 Tid (tm, Xj,m) Ov; 5,9 (tm, Xj,m) (tm+1 — tm) (e? m — 1) 


26To avoid that Um-+1 is negative, we can use the truncation method: 
Um+1 *~ max (Um+1, 0) 


or the reflection method: 
Um+1 *— |Um+1| 


27We have AWzk,m = Wę (tm+41) — Wk a 
28By applying Itô’s lemma to Y; = 4 (w. y(t) — >t), we obtain dY (t) = W; (t) dW; (t). 
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We obtain the same expression as the formula given by Equation (13.3), except that the 
random variables £j m and £; m may now be correlated. 


Example 149 If we apply the fixed-interval Milstein scheme to the Heston model, we ob- 


tain: 1 
ln Xm+1 = ln Xm 4 (u stm) h+VUmh> Elm 
and: 1 
Un ET Un, +a(b— um)h +0 Umh: E2 m + 17 (E2m = 1) 


Here, €1,m and €2,.m are two standard Gaussian random variables with correlation p. 


Remark 162 The multidimensional Milstein scheme is generally not used, because the 
terms Lo; pe (Gris Xm) Le, k’) are complicated to simulate. For the Heston model, we obtain 
a very simple scheme, because we only apply the Milstein scheme to the process v(t) and 
not to the vector process (In X (t) ,v (t)). If we also apply the Milstein scheme to In X (t), 
we obtain: 


1 
In Xin4i = In Xm t (u D h4 V Umh E1m + Am 


where: 


2 2 2 
Am = 5 y (>: Ok" k (tm; Xm) “gaa Lk,k’) 


k=1 k'=1 \k’=1 
1 
= ovu(t 
teg ra A 
o 
—- Sa 
2 (2,1) 


Let Wa (t) = pW, (t) + \/1— p?W* (t) where W* (t) is a Brownian motion independent 
from W1 (t). It follows that: 


Ion = [7 f amo amo) 

= f7 (om) + VTE EW Os) awi (s) - 
JT (Wa m) + VTT PW ttm) aW: 8) 
of 7 (8) — Wi (tm)) W3 (s) + 


VIZA J WC) = W (tm) aH (8) 


and: 


1 
TZ2,1) = 5? ((AWi m)? (tm+1 tm)) t Bm 


We finally deduce that the multidimensional Milstein scheme of the Heston model is: 


1 1 
ln Xm+1 = ln Xm 4 (1 stm ht Vinh €1m + qeoh (eim — 1) + Bm 
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and: 


1 
Um+1 = Um +a (b — Um) h+oVUmh + €am + qo (€3 am — 1) 


where Bm is a correction term defined by: 
m+1 


RT E i: (W* (s) — W* (tm)) AW (5) 


We notice that Bm cannot be explicitly calculate and requires numerical integration to be 
simulated? . 


13.3 Monte Carlo methods 


At the beginning, the Monte Carlo method is a numerical tool for computing integrals 
based on the simulation of random variables (Metropolis and Ulam, 1949). By extension, it 
now defines all numerical methods, which use simulations. 


13.3.1 Computing integrals 
13.3.1.1 A basic example 


One of the early uses of the Monte Carlo method was the numerical calculation of 
the number m by Bouffon and Laplace. Suppose we have a circle with radius r and a 
2r x 2r square of the same center. Since the area of the circle is equal to mr?, the numerical 
calculation of m is equivalent to compute the area of the circle with r = 1. In this case, the 
area of the square is 4, and we have*?: 


L A (circle) 
~~ A (square) 


To determine 7, we simulate ng random vectors (us, Us) of uniform random variables Uj—1,1] 
and we obtain: 


where ne is the number of points (us, vs) in the circle: 
ns 
ne= > 1 {u2+v? <r} 
s=1 


We illustrate this numerical computation in Figure 13.30 with 1000 simulated points 
(us,Us). We indicate by a red cross symbol (resp. by a blue square symbol) the points 
which are inside (resp. outside) the circle. In this experiment, we obtain ne = 802 and 
m œ 4 x 802/1000 = 3.2080. 


2° However, Bm is not independent from Elm: 
30In fact, this relationship holds for all values of r. 
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FIGURE 13.30: Computing 7 with 1000 simulations 


13.3.1.2 Theoretical framework 


We consider the multiple integral: 


t= fof pay.) dx, -d£n 
Q 


Let X = (X1,...,Xn) be a uniform random vector with probability distribution Ujo), 
such that Q is inscribed within the hypercube [Q]. By construction, the probability density 
function is: 

J (Big<225n) = 


We deduce that: 


I 


II 


f- f Men En) E Q} -Y (21,..-, En) dri- dEn 


= E[1{(X1,..., Xn) EQ} y (X1,..., Xn)] 
2 {h(X,...,Xn)] 


where: 
Ah(a1,.--,%n) = L {(z1,..., £n) E Q} -p (£1,.--, Ln) 


Let În s be the random variable defined by: 


ie 
Z= h(Xis,..., Xns 
g “aps ( 1, a) 
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where {X1,s,.--,Xn,s},s1 is a sequence of iid random vectors with probability distribution 
Ug). Using the strong law of large numbers, we obtain: 


lim f,, = E(h(X1,...,Xn)] 


ns=>0%œ0 5 
= fo f reste) dy dey 
Q 


Moreover, the central limit theorem states that: 


3 


When ng is large, we can deduce the following confidence interval: 


vns’ vns 


where a is the confidence level, ca = ®~! ((1+ a) /2) and În, is the usual estimate of the 
standard deviation: 


25 


20 


— n; = 1000 
— — ns = 10000 


A , Sa 
3.00 3.05 3.10 ota 3.20 3.25 3.30 


FIGURE 13.31: Density function of îns 


We consider again the calculation of m. Previously, we obtain an estimate, which is far 
from the true value. In order to obtain a better precision, we can increase the number ng 


Monte Carlo Simulation Methods 841 


of simulations. In Figure 13.31, we report the density function of the estimator îns when 
ng is respectively equal to 1000 and 10000. We notice that the precision increases by a 
factor of v10 every time we multiply the number of simulations by ten. More generally, for 
a given precision p, we can deduce the sufficient number of simulations: 


Pe 2 
S 
ns 2 | Ca =s 
P 


In the case of the calculation of 7, we have Sns zx 1.642. To obtain a precision of m with six 
digits after the decimal point at a 99% confidence level, we need about 18 trillion simulations: 


IV 


ns 


(a= (0.995) x 


17.9 x 10!? 


V 


Example 150 We would like to calculate the following integral: 


r= fff +u da dy dz 


This integral can be easily evaluated with Gaussian quadrature methods when Q is a cube. 
However, the problem is more tricky when: 


Q= { (x,y,z) E RẸ : 1? +y? +2? <2,c+y+2> 2} 


Using the Monte Carlo method, we have I = E [h (X,Y, Z)] where X,Y and Z are three 
independent uniform random variables with probability distribution Ujo,s) and: 


-J (t+y+z) f (ey,z)€0 
hæna) =f 0 if (x,y,z) ¢ 0 


In Figure 18.32, we report the estimate Taz and the corresponding 99% confidence interval 


with respect to the number of simulations ns. 


13.3.1.3 Extension to the calculation of mathematical expectations 


Let X = (X1, ..., Xn) be a random vector with probability distribution F. We have: 


Hotta Re) = fo freer stn) dP i++ an) 
= fo fee ers esse) Faye sate) diy day 


= fon fee dx,---dz, 


where f is the density function. The Monte Carlo estimator of this integral is: 


1 ns 
Ing — 5 p (X15, te Xn,s) 
ns = 
where {X1,5,.--,Xn,s},51 is a sequence of iid random vectors with probability distribution 


F. Moreover, all the previous results hold in this general case where the random variables 
are not uniform. 
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FIGURE 13.32: Convergence of the estimator Te 


Example 151 In the Black-Scholes model, the price of the look-back option with maturity 
T is given by: 


C= et 


+ 
(s (T) - ee S 0) | 
where the price S (t) of the underlying asset is given by the following SDE: 
dS (t) = rS (t) dt+ oS (t) dW (t) 


where r is the interest rate and o is the volatility of the asset. It is difficult to calculate C 
analytically, because it requires the joint distribution of S(T) and mino<t<r S (t). However, 
we can easily calculate it using the Monte Carlo method. For a given simulation s, we use 
the exact scheme to simulate the geometric Brownian motion: 


s 1 
Sa = S exp ( (= Fo?) (tes = tm) tovin a ek) 


where eS) ~ N (0,1) and T = tm. The Monte Carlo estimator of the option price is then 
equal to: 


= ett oe (s) (s) + 
€ = _ Y (si) - mins) 
ns s=1 al mn 


We deduce that the precision of the estimate depends on the number ng of simulations, 
but also on the number M of discretization points. In Figure 13.83, the option price is 
calculated using these parameters: So = 100, r = 5%, o = 20% and T = 3/12. We consider 
100000 simulations whereas the number M of discretization points varies between 5 and 
100. We notice that the 99% confidence interval does not really depend on M. However, 
the option price increases with the number of discretization points. This is normal because 
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8.5 


Price obtained with the diffusion bridge 


Number of discretization points 
FIGURE 13.33: Computing the look-back option price 
mino<:<r S (t) is always underestimated by minm se), >). This is why we have also reported the 
option price using the diffusion bridge approach. This example shows that the MC method 


does not always converge when the function ọ (z£1,..., £n) is approximated. 


Let us consider the following integral: 


r= fi fh CaS Cree 


We can write it as follows: 


CSP Gay OO date 


where f (%1,:-+ , £n) is a multidimensional density function. We deduce that: 
I=E e 
O F(X,- Xn) 


This implies that we can compute an integral with the MC method by using any multidi- 
mensional distribution function. If we apply this result to the calculation of 7, we have: 


TtT = // da dy 
x2?+y?<1 
= o 


_ oo Do TENT 
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We deduce that: 


| 1{X° +Y?” <1} 
T=E 

(X) (Y) 
where X and Y are two independent standard Gaussian random variables. We can then 
estimate m by: 


Tns — 


1 pecan ee, 
ng gle) d (us) 
where x, and ys are two independent random variates from the probability distribution 


N (0,1). For instance, we report the points (£s, ys) used to calculate m with 1 000 simulations 
in Figure 13.34. 


m 3 


FIGURE 13.34: Computing 7 with normal random numbers 


Remark 163 The previous approach is particularly interesting when the set Q is not 
bounded, which implies that we cannot use uniform random numbers. 


13.3.2 Variance reduction 
1) 


We consider two unbiased estimators PE and i) of the integral I, meaning that 
` ES = ESA = I. We will say that iO) is more efficient than i if the inequal- 


ity var (283) < var (a) holds for all values of ng that are larger than ng. Variance 
reduction is then the search of more efficient estimators. 


13.3.2.1 Antithetic variates 


Theoretical aspects We have: 


I =E[p(X1,...,Xn)] =E(Y] 
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where Y = y(Xj,..., Xn) is a one-dimensional random variable. It follows that: 
z bA 
I Y, — Ys 


We now consider the estimators Yp, and Y/ s based on two different samples and define y* 


as follows: B 


yk Yns + Y 
2 
We have: 
~ [yr = Vay T A 
He] = ele 
= El. 
= I 
and: 
Z vl 
var (Y*) = var (= tis 
1 = 1 =) 1 7 
= qvar (Yans) + 1 (Yrs) + 3 COV (Yans, Ys ) 
—_ 7 4 a 
= 1l+p 2 a var (Fas) 
= a, ) ar (Fas) 


where?! p (Y;, Y?) is the correlation between Y, and Y/. Because we have p (Ys, Y/) < 1, we 
deduce that: 7 E 
var (Y*) < var (Yns) 

If we simulate the random variates Y, and Y/ independently, p (Y;, Y/) is equal to zero and 
the variance of the estimator is divided by 2. However, the number of simulations have been 
multiplied by two. The efficiency of the estimator has then not been improved. 

The underlying idea of antithetic variables is therefore to use two perfectly dependent 
random variables Y, and YJ: 


Y; = 0 (Ys) 
31 We have: 
cov (Yas, Yas) = E(D Y »-E (1) (>E vy Em) | 
= Bl - BM): (v -rm)]+ 
2 yn"s i 
=z >E [V -EYD (Y; - ELY])] 
1 i. 2 
= a 
It follows that: = _ 
cov (Yns: Yf) 


p (Fns: Yas) = = p (Ys, Y;) 


ve (Yas) < var (Xis) 
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where w is a deterministic function. This implies that: 


1 
* 2 y* 
a nA é 
where: 
pank Liri 
a 2 7 2 


It follows that: ae 
p (Yans, Yns) =P (Y,Y') = oY, ¥(Y)) 
Minimizing the variance var (Y*) is then equivalent to minimize the correlation p (Y, 4 (Y)). 
We also know that the correlation reaches its lower bound if the dependence function be- 
tween Y and 4% (Y) is equal to the lower Fréchet copula: 
C (Y, y (Y)) =C- 


However, p (Y, (Y )) is not necessarily equal to —1 except in some special cases. 


We consider the one-dimensional case with Y = y(X). If we assume that y is an 
increasing function, it follows that: 


CY, 4Y) = Cle(X),d(y(X))) 
C (X, 4 (X)) 
To obtain the lower bound C7, X and 4 (X) must be countermonotonic. We know that??: 
Y(X) =F (1-F(X)) (13.8) 


where F is the probability distribution of X. For instance, if X ~ Ujo, 1], we have X’ = 1- X. 
In the case where X ~ N (0,1), we have: 


X = ®1(1-6(X)) 
= 6 1(8(-X)) 
= -X 


Example 152 We consider the following functions: 
1. pı (£) =r? +241; 
2. pa (£) =f +r? +1; 


3. p3 (£) = zt +23 +r? +241; 


For each function, we want to estimate I = E |p (N (0,1))] using the antithetic estimator: 


oe 1 55X) tes) 
ns 2 


ns 


g(x) pile) p2(x) 3 (2) 
El (Xs)] or i [p (—Xs)] 1 5 5 
var (y (Xs)) or var (y (—Xs)) 22 122 144 
cov (y (Xs), (—X+)) —22 122 100 
ply (Xs) ’ (—Xz)) al 1 25/36 


32See Section 11.2.1 on page 722. 
33Let X ~ N (0,1). We have E[X?] = 1, E[X?™] = (2m — 1)E [X?™-?] and E[X?™+1] = 0 for 
mEN. 
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We notice that the antithetic estimator is fully efficient in the first case, because its variance 
is equal to zero. In the second case, it is not efficient because we have var O) = var (Yng): 
Finally, the antithetic estimator reduces the variance by 15.3% in the last case. 


To understand these numerical results, we must study the relationship between C (X, X’) 
and C (Y, Y”). Indeed, we have: 


{C (X, X) =C >C (Y,Y) =C} e y (a) >0 


We have represented the three functions y1 (£), p2 (x) and y3 (x) in Figure 13.35. Because 
(1 (x) is an increasing function, it follows that the copula function between Y and Y’ reaches 
the lower Fréchet bound. The function y2 (x) is perfectly symmetric around x = 0. In this 
case, it is impossible to reduce the variance of the MC estimator by the use of antithetic 
Gaussian variates. Even if the function v3 (a) is not monotonous, it is however sufficiently 
asymmetric to obtain a low but significant reduction of the variance of the MC estimator. 


a) 92 
600 


400 


200 


PVO TSO OE 0 , , h 
=b = -2 -1 0 1 2 3 4 5 -4 -3 -2 -1 0 1 2 3 4 =5 


93 93 
600 10 


400 


gs'(x) 2 0 


200 


=A -3 -2 -1 0 1 2 3 4 °5 =f >ei 0 1 2 


FIGURE 13.35: Functions yı (x), p2 (x) and y3 (x) 


Remark 164 In the case where y is a decreasing function, we can show that the lower 
bound CT between Y and Y’ is also reached when X and Y(X) are countermonotonic 
(Ross, 2012). 


The extension of the previous results to the multidimensional case is not straightforward. 
Indeed, the copula condition between Y and Y’ becomes: 


CY, VY") = © S C (y (X1,..., Xn), 9 (Xis -e Xn) = CH 


where X},...,X/, are the antithetic variates of X1,..., Xn. A natural generalization of the 
relationship (13.8) is: 
X} =F71 (1—F; (Xi) 


u 
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where F; is the probability distribution of X;. By assuming that y is a monotonic function 
of each of its arguments, Ross (2012) shows that: 


ply, Y’) <0 


where: 


Y'= (FI (1 -F (X1)),..-, Fr’ (1 — Fn (Xn))) 


This means that we can reduce the variance of the MC estimator by using the antithetic 
variates X! = F7 ' (1 — F; (X;)). However, it does not prove that this approach minimizes 
the correlation p (Y, Y’). Moreover, we have no results when y is a general function. 


Application to the geometric Brownian motion In the Gaussian case X ~ N (0,1), 
the antithetic variable is: 
X'=—-—X 


As the simulation of Y ~ N (u, 07) is obtained using the relationship Y = w+ oX, we 
deduce that the antithetic variable is: 


Y’ = p-oX 
z go A 
o 
= 2w-Y 


If we consider the geometric Brownian motion, the fixed-interval scheme is: 
l 2 
Xm+1 = Xm - exp H= 37 h+ovh: em 


whereas the antithetic path is given by: 


1 
Xas = Xm ep ((u- 50?) ho hen) 


In Figure 13.36, we report 4 trajectories of the GBM process and the corresponding anti- 
thetic paths*. 


In the multidimensional case, we recall that: 


1 
Ximi = Xim -exp (ns = 573) h+ ojvh : esm) 


where Em = (€1,m;---,€nzm) ~ Nn (0, p). We simulate €m by using the relationship Em = 
P -Nm where nm ~ Mn (0, In) and P is the Cholesky matrix satisfying PP' = p. The 
antithetic trajectory is then: 


1 
Xi tl = Xim -exp ((u — 5) h+ a;Vk- ? 


where: 
/ 
Em = —P -Nm = —Em 


We verify that eh, = (€l m>- -+ Ehm) ~ Nn (0, p). 


34The parameter values are Xo = 100, u = 10% and o = 20%. 
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i ., 
80 oyun? eee 


FIGURE 13.36: Antithetic simulation of the GBM process 


Example 153 In the Black-Scholes model, the price of the spread option with maturity T 
and strike K is given by: 


C=e TE [sı (T) - S2 (T) - K)* 
where the prices Sı (t) and Sə (t) of the underlying assets are given by the following SDE: 


{ dS; (t) = ry (t) dt + oisi (t) dWi(t) 
dS» (t) = r So (t) dt + T2892 (t) dW2(t) 


and E |W: (t) W2 (t)] = pt. To calculate the option price using Monte Carlo methods, we 
simulate the bivariate GBM Sı (t) and S> (t) and the MC estimator is: 


-rT ™ 


g 
uo = 2 (s@ (ys) @) =x)" 


where a (T) is the s*” simulation of the terminal value S; (T). For the AV estimator, we 
obtain: 

(s) (s) t (gis) i(s) i 
eT ws (SOT) — $$) T- K) + (5 T) -3 (1) -K) 


ns 2S 2 


Cay = 


where ga (T) is the antithetic variate of g” (T). In Figure 13.87, we report the probability 
density function of the estimators Cuc and Cay when ng is equal to 100035. We observe 


35The parameters are Sı (0) = S2 (0) = 100, r = 5%, 01 = o2 = 20%, p = 50%, T = 1 and K =5. 
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that the variance reduction is significative and we obtain: 


var (êw) 
var (Ec) 


= 34.7% 


4.5 5.0 5.5 6.0 6.5 7.0 
Option price 


FIGURE 13.37: Probability density function of Ĉmc and Cay (ns = 1000) 


13.3.2.2 Control variates 


Let Y = y(X1,...,Xn) and V be a random variable with known mean E [V]. We define 
Z as follows: 


Z=Y +c- (V-E[V]) 


We deduce that: 


l 
x 
+ 
io) 
< 
| 
E 


and: 


var (Z) = var(Y +c. (V -E[V])) 
= var (Y) +2- c- cov (Y, V) +2- var (V) 


It follows that: 


var (Z) < var (Y) = 2-+c-cov(Y,V)+c?-var(V) <0 
=> c-cov(Y,V)<0 
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In order to obtain a lower variance, a necessary condition is that c and cov (Y, V) have 

opposite signs. The minimum is obtained when ôe var (Z) = 0 or equivalently when: 

cov (Y, V) 
var (V) 

The optimal value c* is then equal to the opposite of the beta of Y with respect to the 

control variate V. In this case, we have: 


* 


=-B 


E cov (Y, V) i 
Z=Y T (V -E[V]) 
and: 
var (Z) = var(Y)-— Eaa 


= (1-7 (Y,V))-var(Y) 


This implies that we have to choose a control variate V that is highly (positively or nega- 
tively) correlated with Y in order to reduce the variance. 


Example 154 We consider that X ~ Ujo) and ọ (x) = e”. We would like to estimate: 


fs ste = fear 


We set Y = eX and V = X. We know that E[V] = 1/2 and var (V) = 1/12. It follows that: 
var(Y) = E[Y?]-E?[Y] 


l 
ri 
ine 
8 
bet 
© =. 
l 
— 
m 
m 
l 
fay) 
© 
xe 


2 
> 
NO 
aN 
i) 
= 


and: 


cov(Y,V) = E[VY]-E[V]E[Y] 


II 
A.. 
8 
m 

8 
a 
© — 

| 
a 
pa 
lay N| = 
8 
=e 
8 
| 
| 
“—— 
m 
m 
| 
fav) 
=] 
“—" 


=~ 0.1409 


If we consider the VC estimator Z defined by°®: 
cov (Y,V) 


Bag y EY) 


Y — (18 — 6e) - (v-5) 


II 


36We have 8 ~ 1.6903. 
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we obtain: 
cov? (Y,V) 
= yj Or ara 
var (Z) var (Y) a) 
4e — e? — 
= 5 I gpeg 
= 0.0039 


We conclude that we have dramatically reduced the variance of the estimator, because we 
have: 


FIGURE 13.38: Understanding the variance reduction in control variates 


This example may be disturbing, because the variance reduction is huge. To understand 
the mechanisms underlying control variates, we illustrate the previous example in Figure 
13.38. For each variable, we have represented the relationship with respect to the random 
variable X. We have Y = exp(X) and V = X. To maximize the dependence between Y 
and the control variate, it is better to consider GV instead of V. However, the random 
variable BV is not well located, because it does not fit well Y. This is not the case of 
Y =E[Y]+6(V —-E[V)). indeed, Y is the conditional expectation of Y with respect to V: 


[Y |V] =E[Y]+ 6(V -E[V]) 


This is the best linear estimator of Y. The residual U of the linear regression is then equal 
to: 


U = Y-Y 
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The CV estimator Z is a translation of the residual in order to satisfy E [Z] = E [Y]: 


Z = E[Y]+U 
Y- 8(V-E[V]) 


By construction, the variance of the residual U is lower than the variance of the random 
variable Y. We conclude that: 


var (Z) = var (U) < var (Y) 
We can therefore obtain a large variance reduction if the following conditions are satisfied: 
e the control variate V largely explains the random variable Y; 
e the relationship between Y and V is almost linear. 


In the previous example, these conditions are largely satisfied and the residuals are very 
small?’. 


Remark 165 In practice, we don’t know the optimal value c*. However, the previous frame- 
work helps us to estimate it. Indeed, we have: 


c=- 8 
where 3 is the OLS estimate associated to the linear regression model: 
Ys =a+ BVs + Us 


Because Y, and V, are the simulated values of Y and V, this implies that c* is calculated at 
the final step of the Monte Carlo method. 


We recall that the price of an arithmetic Asian call option is given by: 


c=e"TE|(5- xK) 


where K is the strike of the option and S denotes the average of S (t) on a given number 
of fixing dates?’ {t1,...,tn,}: 


NEF 


- $ R 
S=— X` S(tm) 
merl 
We can estimate the option price using the Black-Scholes model. We can also reduce the 
variance of the MC estimator by considering the following control variates: 
1. the terminal value Vı = S (T) of the underlying asset; 


2. the average value V2 = S; 


3. the discounted payoff of the call option V3 = e~"7 (S(T) — K)*; 
4. the discounted payoff of the geometric Asian call option V4 =e"? (§ — K )* where: 


s= (1 86) 


37The variance of residuals represents 1.628% of the variance of Y. 
38 We have tnp =T. 
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For these control variates, we know the expected value. In the first and second cases, we 
have: 


2 [S (T)] = Soe”? 


and: 


The expected value of the third control variate is the Black-Scholes formula of the European 
call option. For the last control variate, we have: 


EEN Spel 47°) tm 9H md) Y" 


m=1 


where: 


and: 


Because § has a log-normal distribution, we deduce that the expected value of the fourth 
control variate is also given by a Black-Scholes formula??. We consider the following param- 
eters So = 100, K = 104, r = 5%, o = 20% and T = 5. The fixing dates of the Asian option 
are tı = 1, to = 2, t3 = 3, t4 = 4 and t; = 5. In top panels in Figure 13.39, we report the 
probability density function of the MC estimator Ĉmc and the CV estimator Ĉcy when the 


number of simulations is equal to 1 000. The variance ratio var (Cov) / var (ĉuc) is respec- 
tively equal to 22.6% for Vi = S(T), 9.4% for V2 = S, 19.5% for V3 = e™"T (S(T) — K)" 
and 0.5% for V4 = e777 ($— K)”. In bottom panels in Figure 13.39, we also show the 


relationship between the simulated value Y = e~"" (5 -K au and the control variates V; 
and V4. We verify that the linear regression produces lower residuals for V4 than for V4. 


39We have: r 
Elin | =Ins => a) t 
[In | nSo + ( 37 
and: 
var (In 5) =o7v 
where: 


We deduce that: 


zj 
~ 
o 
3 
g 
~ 


es K)*] = Soe?" ® (d+ ov) — Ke~"T 8 (d) 


d= 1 (mn } (r 507) t) 
a/v K 2 


ax J 
yart+ 507 (v-t) 


where: 


and: 
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FIGURE 13.39: CV estimator of the arithmetic Asian call option 


The previous approach can be extended in the case of several control variates: 


where c = (¢1,.--;Cnoy) and V = (Vj,..., Vnoy). We can show that the optimal value of c 
is equal to: 
ct = — cov (V, V)™} - cov (V, Y) 


By noting that minimizing the variance of Z is equivalent to minimize the variance of U 
where: 


U = Y-Ŷ 
Y — (a+ B'V) 
we deduce that c* = — 2. It follows that 
var(Z) = var(U) 


= (1—R?)-var(Y) 


where R? is the R-squared coefficient of the linear regression Y = a+ 8'V +U. 


Let us consider the previous example of the arithmetic Asian call option. In Table 13.5, 
we give the results of the linear regression by considering the combination of the four 
control variates. Previously, we found that the variance ratio was equal to 9.4% for the 
second control variate. If we combine the first three variates, this ratio becomes 3.5%. With 
the four control variates, the variance of the Monte Carlo estimator is divided by a factor 
of 500! 
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TABLE 13.5: Linear regression between the Asian call option and the control variates 

a Â Ê b ÊA R 1-R 

—51.482 0.036 0.538 90.7% 9.3% 

—24.025 —0.346 0.595 0.548 96.5% 3.5% 

—4.141 0.069 0.410 81.1% 18.9% 

—38.727 0.428 0.174 92.9% 7.1% 

—1.559 —0.040 0.054 0.111 0.905 99.8% 0.2% 


Remark 166 The reader may consult the book of Lamberton and Lapeyre (2007) for other 
examples of control variates in option pricing. In particular, they show how to use the put- 
call parity formula for reducing the volatility by noting that the variance of put options are 
generally smaller than the variance of call options. 


13.3.2.3 Importance sampling 


Let X = (X1,...,Xn) be a random vector with distribution function F. We have: 
I = p(X, Loa, Xn) | F] 
fp. foi (Tiye n) f (£1, . <., En) day -+ day, 
where f (x1,..., £n) is the probability density function of X. It follows that: 
I = fe Sle Lg see y an) eed) 9 (er,... gn) der de 
g(%1,.-.-,2n) 
i F (%,...,Xn) | 
= E Xi, , Xn) =I G 
fei 1 ) aa X) 
= i ly (Xi, ---;Xn) £ (X1, ---; Xn) | G] (13.9) 
where g (£1,..., £n) is the probability density function of G and £ is the likelihood ratio: 
aC ee = F (a1, ---)2n) 
g (tizersin) 


The values taken by £ (z1,..., £n) are also called the importance sampling weights. Using 
the vector notation, the relationship (13.9) becomes: 


Ely (X p(X) L(X 


) IG] 


)|F]=E 


It follows that: 


[fue] =E [fis] = 7 
where Îuc and fg are the Monte Carlo and importance sampling estimators of I. We also 


deduce that*°: j 
var (iis) = var (p(X) £(X) |G) 


40Recall that we use the vector notation, meaning that £ = (@1,...,2n). 
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It follows that: 


var (fis) = E [p (X) £ (X) |G] - B?[p(X)£(X) | G] 


2 f? (2) 2 
- [eM sed- 
= [eo Ul gr (13.10) 
g (x) 


If we compare the variance of the two estimators Jc and Îis, we obtain: 


var (fs) - var (fuc) = - g? (2) FAR 2 (2) f (2) dz 


= SF 1) f (a) d 


The difference may be negative if the weights £ (x) are small (£ (x) « 1) because the 
values of y? (x) f (x) are positive. The importance sampling approach changes then the 
importance of some values x by transforming the original probability distribution F into 
another probability distribution G. Equation (13.10) is also interesting because it gives us 
some insights about the optimal IS distribution*! 


g* (2) 


arg min var (iis) 
= argmin f (x) 
c: |p(x)l- f (2) 


where c is the normalizing constant such that f g* (£) dx = 1. A good choice of the IS density 
g (x) is then an approximation of |y (x)| - f (x) such that g (x) can easily be simulated. 


Remark 167 In order to simplify the notation and avoid confusions, we consider that 
X ~F and Z ~ G in the sequel. This means that Imc = 9 (X) and lys = Y (Z) £ (Z). 


We consider the estimation of the probability p = Pr {X > 3} when X ~ N (0,1). We 
have: 
p(x) = 1 {x > 3} 
Because the probability p is low (Pr{X > 3} ~ 0.1350%), the MC estimator will not be 
efficient. Indeed, it will be rare to simulate a random variate greater than 3. To reduce the 
variance of the MC estimator, we can use important sampling with Z ~ N (uz, o2). For 
uz = 3 and c; = 1, we report in Figure 13.40 the histogram of the estimators’? fmc and 


41 The first-order condition is: 


where is a constant. 
42We have: 
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FIGURE 13.40: Histogram of the MC and IS estimators (ns = 1000) 


pis when the number of simulations is equal to 1000. It is obvious that the IS estimator is 
better than the MC estimator. To explain that, we report the probability density function 
of X and Z in the top/left panel in Figure 13.40. Whereas Pr {X > 3} is close to zero, the 
probability Pr{Z > 3} is equal to 50%. Therefore, it is easier to simulate Z > 3, but we 
have to apply a correction to obtain the right probability. This correction is given by the 
likelihood ratio, which is represented in the top/right panel. In Figure 13.41, we show the 
standard deviation o (ıs) for different values of u, and oz. When c, = 1 and pz € [0,5], it 
is lower than the standard deviation of pyc. For uz = 3, the variance ratio is approximately 
equal to 1% meaning that the variance of pyc is divided by a factor of 100. We also notice 
that we reduce the variance by using a higher value of o,. In fact, we can anticipate that 
the IS estimator is more efficient than the MC estimator if the following condition holds: 


Pr{Z > 3} > Pr{X > 3} 


The calculation of the optimal values of u, and a, is derived in Exercise 13.4.9 on page 891. 


and: 
1 
fis = z> D HZ 2 3} £ (Zs) 
s=1 


where: 


Monte Carlo Simulation Methods 859 


0.08 


0.06 


0.04 


0.00 f fi f L f fi L fi f J 


Hz 
FIGURE 13.41: Standard deviation (in %) of the estimator rs (ns = 1000) 
Remark 168 The previous example is an illustration of rare event simulation. This is 
why importance sampling is related to the theory of large deviations. Many results of this 


statistical field (Cramer’s theorem, Berry-Esseen bounds) are then obtained using the same 
approach than the importance sampling method. 


We consider the pricing of the put option: 


P = TE |(K - 5 T)" 
We can estimate the option price by using the Monte Carlo method with: 
g(a) =e"? (K - z)" 


In the case where K < S(O), the probability of exercise Pr {9 (T) < K} is very small. 
Therefore, we have to increase the probability of exercise in order to obtain a more efficient 
estimator. In the case of the Black-Scholes model, the density function of S (T) is equal to: 


ro = 9 (=) 


£O Ox 


where Hg = ln So + (r —o7/ 2) T and o, = o VT. Using the same approach than previously, 
we consider the IS density g (x) defined by: 


ga) = o (=) 


LO, Tz 
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where uz = 0 + py, and a, = oy. For instance, we can choose 0 such that the probability of 
exercise is equal to 50%. It follows that: 


We deduce that: 


where: 


1 ln z — py 
i LOx j ( Ox ) 
x 
1 ln z — uz 
= 
LO x Ox 
0? ln £ — pe 0 
ex . 
E 202 Ox Ox 
and S (T) is the same geometric Brownian motion than S(T), but with another initial 
value: 


8’ (0) = S (0) e? = Ke7 (72°/2)T 


Example 155 We assume that So = 100, K = 60, r = 5%, o = 20% and T = 2. If we 
consider the previous method, the IS process is simulated using the initial value S” (0) = 
Ke-(-/2)r — 56.506, whereas the value of 0 is equal to —0.5708. In Figure 13.42, we 
report the density function of the estimators Puc and Pig when the number of simulations 
is equal to 1000. For this example, the variance ratio is equal to 1.77%, meaning that the 
IS method has reduced the variance of the MC estimator by a factor greater than 50. If 
we use another IS scheme with S' (0) = 80, the reduction is less important, but remains 
significant*®. 


13.3.2.4 Other methods 


We mention two other methods, which are less used in risk management than the pre- 
vious methods, but may be very efficient for some financial problems. The first method 
is known as the conditional Monte Carlo method. Recall that J = E[Y] where Y = 
p(X1,...,Xn). Let Z be a random vector and V = E[Y | Z] be the conditional expec- 
tation of Y with respect to Z. It follows that: 


z [V] 


STY | Z] fe (z) dz 


II 
— 


= EY 


43 The variance ratio is equal to 13.59%. 
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Option price 


FIGURE 13.42: Density function of the estimators Pyyc and Prs (ng = 1000) 


where f, is the probability density function of Z. Recall that: 


var (Y) = E [var (Y | Z)] + var (E[Y | ZJ) 


We deduce that: 


var(V) = var(E[Y | Z]) 
= var(Y)-—E[var (Y | Z)] 
< var(Y) 


because var (Y | Z) > 0 implies that E [var (Y | Z)] > 0. The idea of the conditional Monte 
Carlo method is then simulating V instead of Y in order to reduce the variance. For that, 
we have to find Z such that E[Y | Z] can easily be sampled. It can be the case with some 
stochastic volatility models. 


Example 156 Let X = (Xi, X2) be a standardized Gaussian random vector with correla- 
tion p. We want to calculate p = Pr{X1 <aX2+b}. We have: 


Y = yp (X1, X2) 
1 {Xi < aXə + b} 


and: 


A 1“ 
Puc = z> S51 {X15 < aX2,5 +b} 
s=1 


If we consider Z = X2, we obtain: 


V = EY |Z 
= f(D {X1 < aX2 + b} | X2 = x9] 


862 Handbook of Financial Risk Management 


Because we have X2 = pXı + y1 — p?X3 where X3 ~ N (0,1) is independent from Xı, we 
deduce that: 


V = [a {Xa < a (pXı + V1—pPXs) +b} | X3 = zs] 
Ta 


1— ap 


The conditional Monte Carlo (CMC) estimator is then equal to**: 


_ ig p(t) 
n 


1 — ap 


s=1 


where X3,, ~ N (0,1). In Table 13.6, we report the variance ratio between the CMC and MC 
estimators when a is equal to 1. We verify that the CMC estimator is particularly efficient 
when p is negative. For instance, the variance is divided by a factor of 70 when p is equal 
to —90% and b is equal to 3.0. 


TABLE 13.6: Variance ratio (in %) when a = 1 


Correlation p (in %) 
90.0 75.0 50.0 25.0 0.0 25.0 50.0 75.0 90.0 
0.0 3.2 8.0 16.1 24.5 33.3 43.0 54.0 67.8 79.8 
1.0 2.9 7.3 14.8 22.5 30.6 39.3 48.9 60.0 67.6 
2.0 2.2 5.6 11.3 17.2 23.3 29.6 35.9 41.9 48.8 
3.0 1.4 3.4 7.0 10.7 144 18.1 214 24.6 


b 


The second method is the stratified sampling. Recall that X € Q. Let {0;,7 =1,...,m} 
be a partition“ of Q. We have: 


I 7 


(X)] 


p(x) f (x) dx 


[ee 
$ | ems) de 
5 


[L {X € 93} p(X) 


Pr{X €0;}-Elp(X) | X €Q,] 


j= 
We introduce the index random variable B: 


B=joexX EQ; 


441f ap = 1, we have: 
ns 
1 7 
Pomc = = J ifa 1 = p?X3,54 »>o} = Puc 


s=1 


45This means that Qj Nk = Í and U; Qj =Q. 
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We note p (j) = Pr{B = j} = Pr {X € Q;} and X (j) the random vector, whose probability 
distribution is the conditional law of X | X € Q,. It follows that: 


where I (j) = E[y (X (j))] = E [y (X) | B = j]. We define the stratified sampling estimator 
as follows: 


Istr = Sori) Y (j) 
j=l 
where: 
p 1 ns(j) 1 ns(j) 
Y (j) = — Y, (j) = p (Xs (9) 
Maoa anga" 
Recall that the MC estimator is equal to: 
Îuc=Ŷ 
where: 
: 1 ng 1 ng 
Y=—YS Y, =— X, 
ng 2 ng 2 


The MC estimator can be viewed as a stratified sampling estimator with only one stratum: 
Qı = Q. On the contrary, the STR estimator depends on the number m of strata and the 
distribution of strata. 

Like the MC estimator, it is easy to show that the stratified sampling estimator is 
unbiased*®: 


4, [fsrr] = 
We introduce the following notations. 


1. the conditional expectation p (j) is defined as: 


H) = Ely (X G) = Ele (X)| B=3) 


2. the conditional variance o° (j) is equal to: 
a? (j) = var (ẹ (X (3))) = var (# (X) | B= 3) 


Using the conditional independence of the random variables X (j), it follows that: 


A M p? (j) -a° (J) 
var (îs) = L E a (13.11) 
and: 
vat (ie) = = [I0 PE (HG) - a) (13.12) 


46We assume that ng (j) Æ 0. 
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where: Ea 
i= D/P) uC) 


Using Equations (13.11) and (13.12), it is not possible to compare directly the variance 
of the two estimators because the stratified sampling estimator depends on the allocation 


(ng (1),...,ng(m)). Therefore, we can have var (îs) < var (fc) or var (îs) > 


var (imc). However, for many allocation schemes, the stratified sampling approach is an 
efficient method to reduce the variance of the MC estimator. 


To illustrate the interest of the stratified sampling approach, we consider the propor- 
tional allocation: 


ns (j) = ns - p(i) 
It follows that: 


var (srr) = o SG) -0° (i) 


and: 
var (ic) = DEO P 0+ Yop) - 
= væ (Isr) + +20) (H(i) - a) 


> var (îs) 


Therefore, the stratified sampling estimator has a lower variance than the Monte Carlo 
estimator. In this case, we notice that: 


A 1 
var (isrn) ee z [vary (X)| B 


S 
and: i i i i 
var (sic) = — -E[var p(X) | B] + —- var (E [p (X) | B]) 
ns S ama ns a 


intra-strata variance inter-strata variance 
The stratified sampling approach with a proportional allocation consists of removing the 
inter-strata variance in order to keep only the intra-strata variance. This result gives some 
ideas about the optimal strata. Indeed, the variance reduction is high if the intra-strata 
variance is low. 


We now consider that the strata are given. We write the allocation as follows: 
ns (j) = ns : q (j) 


where the q(j)’s are arbitrary frequencies such that aa q(j) = 1. To find the optimal 
allocation q*, we have to solve the following variance minimization problem: 


q* = arg min var (îs) 


subject to the constraint De q (j) = 1. It follows that the Lagrange function is equal to: 


L(g) =— yee tA; So q(i)-1 


n 
S j=1 j=l 
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We deduce that the optimal allocation ist": 


In this case, we obtain: 


var (srr) 


ns = 
2 
a US oe 
= ng P\J): FWY 


Example 157 We have I = h y(x) dx = E[p(X)] where X ~ Uo). We consider the 
following cases for the function yp (x): 


1. p(x) =a 
2. (£) = ar 
3. p(x) = (1 + cos (max)) /2 


These three functions are reported in Figure 18.48. In Table 13.7, we give the exact value 
of I and the variance of the estimators. For the MC estimator and the function ọ (x) = zx, 
we verify that: 
‘ 1 
ng + var (ic) = var (Uo,1) ee 
For the STR estimator, we consider fixed-space strata X (j) € (—, ł] implying that p (j) = 
1/m. We can then simulate the conditional random variable X (j) by using the following 
transformation: 


. j-1 U 
xG) == += 


m m 


where U ~ Upon]. In Table 13.7, we report ng + var (îs) when m is equal to 10. We 


notice that the variance is approximately divided by 100 when we consider the proportional 
allocation q (j) = p (j). To understand this result, we consider the function p(x) = x. In 
this case, the variance of the stratum j is equal to*®: 


o’ (j) = E[X*(j)] -E [X o) 


47 The first-order condition is: 


= . — +rA=0 


p (i) -o (i) 
q (j 


j is constant. 
48The density function of X (j) is f (x) = m. 


implying that the ratio 
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We deduce that: 


PH = x25 (#-G-2)) 


1 
12m? 


This implies that the variance of strata is equal to the variance of the uniform random 
variable divided by a factor of m?. These intra-strata variances are given in Figure 13.44. 
For the function p(x) = x°, the variance of strata increases with the index j. This is 
normal if we consider the graphic representation of the function ọ (x) in Figure 13.48. 
Indeed, the curvature of p(x) implies that there is more variance when x increases. We 
have also calculated the variance of the STR estimator when we use the optimal allocation 
q*, which is reported in Figure 13.45. In this case, we allocate a more important number 
of simulations for strata that present more variance. This is perfectly normal, because these 
strata are more difficult to simulate than strata with low variance. However, the gain of the 
optimal allocation is not very significant with respect to the proportional allocation. 
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FIGURE 13.43: Function w (x) 


In the case of the uniform distribution Ujo}, we have used fixed-space strata X (j) € 
i, ij, implying that the probability p (j) is equal for all the strata. This is the most 


popular method for defining strata. In the case of a general probability distribution F, we 
define the conditional random variable X (j) as follows: 


X(j) =F} (+5) (13.13) 
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TABLE 13.7: Comparison between MC and STR estimators 


g(x) E 


2 


T (1 + cos (rg)) /2 
I ı 0.50000 0.33333 0.50000 
ng var (fuc) | 0.08333 0.08890 0.12501 
nsvar(fsrr) pU) 1 0.00083 0.00113 0.00105 
ng var ( Isr aœ (j) l 0.00083 0.00083 0.00084 
g(x) = x 
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FIGURE 13.44: Intra-strata variance a? (j) (in bps) 


where U is a standard uniform random variate. We deduce that X (j) € [Ft ( 


pj) = Pr{xe le (=) ( 


F-! (+)] and: 


Mi 


TOR) 


1 


m 
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-1), 


In Figure 13.46, we have reported the strata defined by Equation (13.13) for different prob- 


ability distribution when m is equal to 10. 


The previous method consists in defining strata in order to obtain equal probabilities 
p(j). It can be very different than the optimal method, whose objective is to define strata 
such that the intra-strata variances ø (j) are close to zero. In order to illustrate the two 
approaches, we consider the pricing of an European call option in the Black-Scholes model. 
Recall that the price of the call option is equal to: 


C=e TE [max (0, Spel" 22°)T +o VTX — kK) 
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FIGURE 13.45: Optimal allocation q* (j) (in %) 
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FIGURE 13.46: Strata for different random variables 
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where X ~ N (0,1). Let us apply the stratification method with strata defined by Equation 


(13.13). We have: 
i— 1 
X(j)=o7 (i i =) 


m m 


where U ~ Uo. 1). We deduce that: 


etl T 1 ns(j) EE > 
toe (m) = 5 ne) 5 max Q Spel 30° )T+oVTX5(5) _ K) 
j 


m n 
=l S g=l 


However, we notice that max (0, S (T) — K) is equal to zero if: 


Soe -727 \T+oVTX _ K <0 
K 1 
> X< = In rT | + =o0VT 
= Ta (a) ) 2 


It is then natural to define the first stratum by ]—oo, xı]. Indeed, we have ọ (X (1)) = 0, 
implying that the intra-strata variance ø (1) is equal to zero. For the other strata, we can 
use the previous approach. For j > 2, we have: 


pO) =Pr{X <aj}—Pr{X < zji} = =O 
We deduce that: 


Pr{X <ay}=p(1) + a-p) 


1 
and: ; 
_ a-l j=1 
zj=% |p(1)+ pt) 
m-—1 
The j* stratum is then defined by X € [x;j—1, £j] with £o = —00, 2m = +00 and: 
p(1) if j=l 
\=4 1-p(l 
(i) BY it j>a 


To simulate the conditional random variable X (j) for j > 2, we use the following scheme: 
X (J) = B (® (zj-1) + (® (z5) — 8 (@j-1)) -U) 
where U ~ Ujo,1]. We deduce that: 


ns(j) 
Ia (m) = (PO J eal SE (Sgelt$e")Ftev THU) _ K) 
s=1 


In the case of proportional allocation ns (j) = ns -p (j), we notice that the total number of 
simulations is reduced and equal to (1 — p (1)) - ng because we don’t have to simulate the 
first stratum. 

We consider an European call option with the following parameters: Sg = 100, K = 105, 
o = 20%, T = 1 and r = 5%. In Figure 13.47, we have reported the variance of the two 
stratified estimators for different values of m when the number of simulations ng is equal 
to 10000. In the case m = 1, we obtain the traditional MC estimator and we have: 


var Gon (1) = var (Iie (1) = var (isc) 
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When m > 1, we obtain: 
var (12a (m)) < var (1a (m)) < var (fuc) 


The second stratified estimator is then more efficient than the first stratified estimator, 
because the design of the first stratum is optimal*®. 
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FIGURE 13.47: Variance of the two estimators leva (m) and a (m) for different values 


of m 


13.3.3 MCMC methods 


Let us consider a Markov chain with transition density p(s®t®]|g®) = 
Pr{X) = 2) | XO =2}. We also assume that it has a stationary distribution 
m (x). In this case, we can show that the Markov chain satisfies the detailed balance equa- 
tion”: 

p(y | x): x (x) = p (x | y) -7 (y) (13.14) 
It follows that: 


feeinzo ay= f pu|a) (e) dy = 7 (2) (13.15) 


Markov chain Monte Carlo (MCMC) methods are a class of algorithms for simulating 
a sample from a probability density function f(x). The underlying idea is to simulate a 
Markov chain, whose limiting pdf is the desired pdf f (x). It is then equivalent to find the 


49We have xı = 0.093951 and pı = 53.74%. 
50Tn the discrete case, we have: 


Pi j ` Ti = Pj’ Tj 
for all the states (i, j) of the Markov chain. 
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transition kernel « (2+! | 2) such that the detailed balance property is satisfied: 


K(y|x)-f(@)=K(@]y)- fy) (13.16) 
In this case, MCMC methods then generate a sample such that: 


ty P(X =} =r 


Because the solution of Equation (13.16) is not unique, there are different MCMC methods 
that will differ by the specification of the transition kernel « (y | x). 


13.3.3.1 Gibbs sampling 


According to Casella and George (1992), the Gibbs sampler has been formulated by Ge- 
man and Geman (1984), who studied image-processing models, and popularized in statistics 
by Gelfand and Smith (1990). Let f (£1,..., £n) be the target probability density function 
and f(x; | £1,..-, Zi—1, Titl; ---; Zn) the conditional density of the it? component of the 
random vector X = (X1,..., Xn). At iteration t, the Gibbs sampler (GS) is given by the 
following steps: 


e we draw z? ~ f (x1 | ea ae 
e we draw sP ~ f (x2 | 22h), a), 
e we draw ay ~ f (xs (sP, P, eee an) 
e we draw 
om f| a; | rP, Gs cool, wae 
iteration t iteration t—1 


fori = 4,... n — l1; 


e we draw al) ~f (an | Bu ieee a) 


The algorithm is initialized with a ies a) . After t iterations, we obtain the following 


Tee aa ap) raa (2? SN a (13.17) 


Under some conditions and if t is sufficiently large, Ca kia vat ’) is a random sample of 


Gibbs sequence: 


the joint distribution f (£1,..., £n). To obtain ng simulations of the density f (£1,..., £n), 
Gelfand and Smith (1990) suggested then to generate ns Gibbs sequences and to take 


the final value CR from each sequence s. The Monte Carlo estimator of 


ale (X1, ..-, Xn)] is then equal to: 


i Ta 
Ing => Yy (228, a x9) (13.18) 


However, if the Markov chain has reached his stationary state at time ny, this implies that 
the Gibbs sequence: 


ae ena) rer Caen aa (13.19) 
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is also a Gibbs sample of the density f (£1,..., £n). We can then formulate another MC 
estimator??: 
A 1 ua ( +t) 
an (x n TE 13.20 
s7 ng 2, Pay ( ) 


However, contrary to the Gelfand-Smith approach, the random vectors of this estimator are 
correlated. Therefore, the variance of this estimator is larger than in the independent case. 


Let us consider the two-dimensional case. We note (X, Y) the random vector and f (x,y) 
the targeted distribution. At time t, we have X( = x and Y® = y, and we assume that 
XO+) = y' and Y“) = y'. If (x,y) ~ f, the density g(x’, y’) of the Gibbs sample is equal 
to>?: 


jea = fren Helate iontu 


f(0',y) F(y2”) 
[fen jy) fete) 7 


where fy and fy are the marginal densities of the joint distribution f (x, y). It follows that: 


LOA f(x,y) f(x,y) 
e = RO fel) 


Jte ly) f |2’) fa!) dedy 


f(y’, 2") de dy 


= fue) f ely ful2) de dy 


Because the events {X | Y = y} and {Y | X = 2’} are independent, we obtain: 


g(a’) = five") ff ely) ax f Fula) dy 
= f(y") 


We deduce that (x',y’) ~ f. If the Gibbs sampler reaches a stationary regime, it then 
converges to the targeted distribution f (x,y). 


Remark 169 In the two dimensional case, we notice that the proposal kernel is equal to: 


k(x, y | x,y) x f(x ly) fy |z) 


In the general case, we have: 


n 
k (y | £) x] [fw | Yipes y Uic Crile Tra) 
i=l 


Example 158 Casella and George (1992) consider the following joint distribution of X 
and Y: 


f (2,y) x C) y7te (1 — y)” tE 


51This approach requires a burn-in period, meaning that an initial number of samples is discarded. 
However, it is not always obvious to know when the Markov chain has converged. 

52We have the following sequences (x,y) > (a’,y) where x’ ~ f(X |y) and (2’,y) > (2’,y’) where 
/ / 
y ~ fY | 2’). 
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where x € {0,1,...,n} and y € [0,1]. It follows that: 


f(cly) x yt- y. C) y? (1 y)” 


x C) yay) 


as B (n, y) 


and: 


Fula) « (2) artia- pt 


rta—l (1 NE gua aie 


x y y 


Therefore, {X | Y = y} is a Bernoulli random variable B (n, p) with p = y and {Y | X = x} 
is a beta random variable B (a', 8") with of = x +a and B' = n — x + p. In Figure 13.48, 
we have reported the Gibbs sequence of 1000 iterations (oc), y) for the parameters n = 5, 
a = 2 and B = 4. The initial values are x = 5 and y = 1/2. We assume that the 
burn-in-period corresponds to the initial 200 iterations. We can then calculate I = E[X - Y] 
by Monte Carlo with the next 800 iterations. We obtain I =0.71. We can also show that the 
variance of this MCMC estimator is three times larger than the variance of the traditional 
MC estimator. This is due to the high autocorrelation between the samples”®. 


13.3.3.2 Metropolis-Hastings algorithm 


Like the Gibbs sampler, the Metropolis-Hastings algorithm considers a multidimensional 
probability density function f (x) = f (£1,..., £n). Let q (y | £) = q (Y1, --;,Yn | Z1,- --, Zn) 
be the Markov transition density or the proposal density. The Metropolis-Hastings (MH) 
algorithm consists in the following steps: 


1. given the state x, we generate y ~ q (Y | 2) from the Markov transition density; 
2. we generate a uniform random number u ~ U9 ,1); 


3. we calculate the density ratio r (x,y) defined by: 


and a (2, y) = min (r (1! 9) ,1); 
4. we set: 
tt) — Y ifu<a (2, y) 
x) otherwise 


The Metropolis-Hastings algorithm can be viewed as an acceptance-rejection algorithm, 
when the samples are correlated due to the Markov chain (Hastings, 1970). 


53We have: 
p(x .y@, x@-D .y@-D) = 52% 
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FIGURE 13.48: Illustration of the Gibbs sampler 


The underlying idea of the MH algorithm is to build a kernel density « (y | x) such that 
the Markov chain converges to the targeted distribution f (x). In this case, we must verify 
that: 


k (y | z) - f (z) = K (z | y) - fy) 
It would be a pure coincidence that the kernel density «(y | x) is equal to the proposal 
density q (y | x). Suppose that: 


q ly |x): f(z) >a@(z|y)- f(y) 


Chib and Greenberg (1995) explain that “the process moves from x to y too often and 
from y to x too rarely”. To reduce the number of moves from x to y, we can introduce the 
probability a (x,y) < 1 such that a (y, x) = 1 and: 


a(y|x)-a(a,y)- f(z) = a(xly)-aly,x)- f(y) 


where a (y, x) = 1. We deduce that: 
a(x |y): f(y) 
q (y |x): f(x) 
If q (y | x): f (£) < q (æ | y): f (y), we have q (y | £): f (x) = q (z | y) -a (y, £): f (y). Because 


the Markov chain must be reversible, we finally obtain that: 
ael SW) 
q (y | x): f(x)’ 


From the previous analysis, we deduce that the kernel density is « (y | x) = q (y | x) a (x,y). 
However, this result does not take into account that the Markov chain can remain at zx. 


a (x,y) =r (x,y) = 


ei 
ee 


a (x,y) = min ( 
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Therefore, Chib and Greenberg (1995) show that the kernel density of the MH algorithm 
has the following form: 


s(y|z) = q(y|z)a(z,y)+ 


(1- fawleya(ey) ay) 5 (y) 


where ôs (y) is the Dirac delta function. 


Remark 170 The previous analysis shows that a (x,y) is the probability to move from x 
to y. a(x,y) is then the acceptance ratio of the MH algorithm. Contrary to the rejection 
sampling algorithm, the efficiency of the MH algorithm is not necessarily obtained when 
a(x,y) is equal to 1. Indeed, there are two sources of correlation between z) and ett): 
(1) the correlation p(z,y) between x and y, and (2) the correlation p(x, 2) be- 
cause y is rejected. Therefore, we face a trade-off between reducing the correlation p (2, y) 
and increasing the acceptance ratio a (c,y). Suppose for instance that we use a proposal 
distribution with small variance, the correlation p (cz, y) is high, but the acceptance ratio 
a (c,y) is high. On the contrary, if we use a proposal distribution with small variance, 
the correlation p (x,y) is low, but the acceptance ratio a (2, y) is also low. Therefore, 
it is extremely difficult to find a proposal distribution such that the correlation p (oO, y) is 
low and the acceptance ratio a (2, y) is high. 


In the original Metropolis algorithm (Metropolis et al., 1953), the authors assumed that 
the proposal distribution is symmetric: q (y | £) = q(x | y). In this case, the acceptance 


ratio is equal to: 
a (x,y) = min (4. 1) 


An example of such algorithm is the random walk sampler: 
Y=ae04Z 


where the random vector Z follows a symmetric distribution. Another special case of the 
Metropolis-Hastings algorithm is the independence sampler: q (y | x) = q (y). The proposal 
distribution does not depend on x and the acceptance ratio becomes: 


. (a(x): f (y) 
a (x,y) = min (U a 
q (y): f (2) 
The MH algorithm is then very similar to the rejection sampling method, except that it 
produces correlated samples. We also notice that the Gibbs sampler is a special case of the 
MH algorithm where”*: 


q (y | x) ~ f (a | ea) 


Example 159 We consider the simulation of the bivariate Gaussian random vector X = 
(X1, X2) ~N (u, 5) with the Metropolis-Hastings algorithm and a symmetric proposal dis- 
tribution. It follows that: 


exp (3 w- 4)" E(w- a) 


TEE C) 


a (x,y) = min 


3 


54At each iteration, we have y; ~ f (xi | x-i), yj = zj if j Ai, and a (x,y) = 1. 
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The parameters are pı = 1, u2 = —1, 01 = 2, o2 = 1 and p = 99%. We use the random 
walk sampler Y; = ol + Zi fori =1,2. The random vector (Z1, Z2) is generated using the 


following four proposal distributions: 
(a) Zı ~ N (0,1) and Z2 ~ N (0,1); 
(b) Zı ~N (0,0.1) and Z2 ~ N (0,0.1); 
(c) Zı ~U_-2,3) and Z2 ~ U_2,9); 
(d) Z~ Ui—o.2,0.2] and Za ~ Ui—o.2,0.2]- 


In Figure 13.49, we have reported the simulated samples of the first 2000 iterations for the 
four cases when we use a burn-in-period of 500 iterations. The sampler is initialized with 
zf = rO = 0. The acceptance ratio is respectively equal to 15% (a), 43% (b), 10% (c) 
and 72% (d). The acceptance ratio is the highest for the case (d). However, we notice that 
this proposal distribution is slow to explore the entire space. To obtain a sample such that 
z > 3 and a > 0, we need more iterations (ng > 5000). On the contrary, proposal 
distributions (a) and (c) have a high variance. The exploration of the probability space is 
then faster, but the acceptance ratio is also lower. This example illustrates the trade-off 
between the autocorrelation and the acceptance ratio. 


Proposal distribution (a) Proposal distribution (b) 


FIGURE 13.49: Illustration of the random walk sampler 


The previous example is purely illustrative, because we don’t need to use MCMC for 
simulating Gaussian random vectors. Let us now consider the following bivariate probability 
density functions: 


(a) the pdf is a perturbation of the Gaussian density function: 


f (£1, £2) X exp (-2} — 7 — zı) 
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(b) the pdf is a mixture of two Gaussian density functions: 


f (x1, £2) x exp (—aj — 3) + exp (xf — x3 — 1.8 - x1 - £2) 


(c) the pdf is a complex function of ® (z1) and © (x2): 
f (1, £2) x exp (—0.1 - ® (x1) - ® (x2) - x2) 


(d) we consider the pdf of the Clayton copula with two exponential marginal distribu- 


tions”: 


—1/0-2 


‘saa 


f (x1, 22) = (1 + 0) AyAge™ A181 (uj P+ uz? — 1) (uy U2 


where u; = 1 — e~*!™! and ug = 1—e7*?”?. 

We notice that we don’t know the normalization constant for the first three pdfs. The third 
pdf is very complex and needs a very accurate algorithm for computing the Gaussian cdf. 
The fourth case is a copula model. We use the bivariate Gaussian probability distribution 
with a correlation equal to 50% for the proposal distribution. A sample of 1500 iterations 
simulated with the random walk sampler is given in Figure 13.50. The fourth panels have 
been obtained with the same random numbers of U, Z, and Z2. 


FIGURE 13.50: Simulating bivariate probability distributions with the MH algorithm 


Remark 171 For high-dimensional MCMC problems, the proposal distribution is generally 
the Gaussian distribution N (u,X) or the multivariate Student’s t distribution tn (X, v). 


55We use the parameters 0 = 2.5, Ay = 1% and Ay = 5%. 
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13.3.3.3 Sequential Monte Carlo methods and particle filters 


Sequential Monte Carlo methods (or SMC) and particle filters (PF) are used when we 
consider non-linear/non-Gaussian state space models (or hidden Markov models). In these 
models, the state vector X® is characterized by the transition density: 


x) =y | x® =x ~ f (x' |x) 


We assume that the state vector X® is not directly observed. The process that is observed 
is denoted by Y® and is characterized by the measurement density: 


YO =y XO =en Flyla) 
Let cP) = (x,...,2) be the exhaustive statistic of the system. By the Markov 


property, we have: 
T 
(0:7) \ — (0) (t) 
see) = (20) [Tr (200) 


where T; represents all the information available at time t (including y™,...,y). To 
calculate f (x | Z;), we have: 
f (2 |2) x f (y | 2) f (x | Z1) (13.21) 


where f (2 |Z,—1) is the prior density of X and f (y® | 2) is the log-likelihood 
function of the observed variable Y ©). This equation is known as the Bayes update step. We 
recall that the prior density of the state variable X“ is given by the Chapman-Kolmogorov 


equation: 
P(E Ba) = f $(2 129) (Da) asa 


This equation is also known as the Bayes prediction step. It gives an estimate of the prob- 
ability density function of X given all the information until t — 1. A Bayesian filter 
corresponds to the system of the two recursive equations (13.21) and (13.22). In order to 
initialize the recurrence algorithm, we assume that the density function of the initial state 
vector f (x) is known. 

The underlying idea of SMC methods is then to estimate the density functions 
f (2® | Z1) and f (2 |T). Given these estimates, we may also compute the best esti- 
mates # | T,—ı and #, which are given by: 


20) Gay = [x |Z] z por (2 |Zi-1) da 


and: 


alt) — [x IZ] = peor (2 |Z.) da® 


For that, we estimate the density f (2 | Ti) by the Monte Carlo method. At time t—1, we 


assume that f an? | Ti) is approximated by a sample eer ie C and a vector 


of associated weights Ca EAr wE I, where ng is the number of simulated particles. 


We deduce that: 
ns 
(t) (t) ) —1) (t) | ».(t-1) 
F(2® B) æ f (V9 129) D utd (2 at=) 


The problem consists then in estimating the states sere ead, at) and the corresponding 


weights {w, RE wi) y 
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Computation of weights Following Arulampalam et al. (2002), we apply the method 
of importance sampling to the joint distribution of x) = (c,..., a): 


Í (y® | att) f (x) | gt-1)) 7E 
f (y® | Z:-1) f (x 1) |Z) 


f (y |2) f (2 | poe) f Cee |Zi-1) 


Let q be the instrumental density such that it factorizes in the following way: 


q (xe |y,Z.1) =q (ap aten 1) | y® YeR i)a (z pei Tea) 


We deduce that®®: 


R 


J Ci | y) D1) 
d (of | YO, T1) 


f(y | 2) ¢ (2 | oY) Ra) 


d (a | va ia) 


f y? Nar VF (ay? | aP) f Cia |Zi-1) 


q Q t) | yt), r (0:t— 1) a 1) q ae | YO, T1) 


fly® | jan) - T Pa upa (13.23) 
a (aS (t) |y,a 0 ie 


The posterior density at time t can then be approximated as: 


f (2 y?) = 2 Ôx (2 - z) (13.24) 


The previous computations lead to the sequential importance sampling (or SIS) algorithm: 


w) x 


1. at time t, we simulate a) ~ q Cy | GP aP Tai 


2. we update the weight wf” using Equation (13.23); 
3. we repeat steps 1 and 2 in order to obtain a sample of ns particles; 


4. we normalize the weights: 


Remark 172 With the SIS algorithm, the variance of the weights increases exponentially 
with the number of particles. This implies that some particles have negligible weights whereas 
others have large weights. This is why resampling techniques are generally added in order to 
reduce this phenomenon. Another method consists in simulating new (or auxiliary) particles 
at each time. Therefore, there exist several SMC algorithms (Arulampalam et al., 2002; 
Doucet and Johansen, 2009): auxiliary particle filter (APF), generic particle filter (GPF), 
regularized particle filter (RPF), sampling importance resampling (SIR), etc. 


56 Note that yt) ET. 
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An example We consider the following example: 


f (c® |a-D) =N (z®;T (t,0¢-Y) ,Q) 
f(y | 2) =N (y®; k (E) 


The corresponding state space model is equal to: 


O T t gt-D) 4 pO 

pr ( ~ Bi (13.25) 
yO =K(c)" +e 

where 7) ~ N (0,Q) and e® ~ N (0, H). We notice that the state space model is non- 

linear. The previous example has been extensively studied with the following specification 

(Carlin et al., 1992; Kitagawa, 1996): 


T (t,x) = 5+ t8: c0s (12-8 
With the following values of parameters « = 1/20, Q = 1 and R = 10, we simulate the 
model (13.25) and estimate « by considering different particle filters. The likelihood 
and prior transition densities are given by f (y |e) and f (c | x). We assume 
that the instrumental density q (x® | y®,2(-,Z,_1) is equal to the transition density 
f (c® | zŒ), meaning that the knowledge of y“) does not improve the estimate of the 
state x“), The particles are simulated according to the following scheme: 


aP =T (taf) + nf? 


where 7? ~ N (0,Q). In Figure 13.51, we report one simulation of the state space model. 
Recall that we observe y“ and not x. The estimate @ is equal to: 


a(t) — Qr™s a(t). p(t) 
x = B Ws Ts 
(t) 


where zs’ is the simulated value of x for the s*" particle and ws” is the importance weight 
given by Equation (13.23). For each simulation, we also calculate the root mean squared 


error: 
1 T ee: 
RMSE = € J 7, (x® = a) 


where x“) is the true value of the state and ¢ is the estimate. We report the probability 
density function of the RMSE statistic in Figure 13.52 when the number of particles is equal 
to 1000. We notice that the SIR algorithm is better than the other SMC algorithms for this 
example. Figure 13.53 illustrates the convergence of the SIS algorithm with respect to the 
number of particles. 


13.3.4 Quasi-Monte Carlo simulation methods 


We consider the following Monte Carlo problem: 


r= ff Q (£1, -., 2n) day--- day 
[0,1]” 


Let X be the random vector of independent uniform random variables. It follows that 
I = E[y(X)]. The Monte Carlo method consists in generating uniform coordinates in 
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FIGURE 13.51: An example of a SMC run with 1000 particles 
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FIGURE 13.52: Density of the RMSE statistic for 1000 particles 
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FIGURE 13.53: Density of the RMSE statistic for the SIS algorithm 


the hypercube [0, 1]". Quasi-Monte Carlo methods use non-random coordinates in order to 
obtain a more nicely uniform distribution. A low discrepancy sequence U = {u1,..., Ung} 
is then a set of deterministic points distributed in the hypercube [0,1]”. Let us define the 
star discrepancy of U by D*: 


We could interpret D* as the Lœ norm between the theoretical continuous uniform dis- 
tribution and the discrete uniform distribution generated by the low discrepancy sequence 
U. We note that if U is really uniform, then lim, +. DAs (U) = 0 for every dimension n. 
Moreover, Morokoff and Caflisch (1994) noticed that: 


where V (f) is the Hardy-Krause variation of f. We could find low discrepancy sequences 
such that the error is of order ng! (Inng)” in probability (Morokoff and Caflisch, 1994). If 


we compare this bound with the order convergence ng” ? of MC, we notice that QMC is 
theoretically better than MC for small dimensions, but MC is better than QMC for high 
dimensions. However, in practice, it appears that QMC could be more accurate than MC 
even for large dimension n. 


Glasserman (2003) reviewed different quasi-random sequences. The most known are 
the Halton, Sobol and Faure sequences and corresponding numerical codes are available 
in different programming languages (Press et al., 2007). The techniques to generate these 
sequences are based on number theory. For example, the Halton sequence is based on the 
p-adic expansion of integers n = dyp* + --- + dip + do and the radical-inverse function 
œ (n) = Da dyp-“+) where d; € {0,...,p—1} for i = 0,...,k. The d-dimensional 
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FIGURE 13.54: Comparison of different low discrepancy sequences 


FIGURE 13.55: The Sobol generator 
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Halton (p; = 2, p2 = 3) Halton (py = 17, p2 = 19) 


Hammersley (p; = 2) Hammersley (pı = 7) Hammersley (p4 = 13) 


Hammersley (py = 37) Sobol (c = 1,2) Faure (c = 1,100) 


FIGURE 13.56: Quasi-random points on the unit sphere 


TABLE 13.8: Pricing of the spread option using quasi-Monte Carlo methods 


ng 10? 10° 104 10° 10° 5x108 
LCG (1) 4.3988 5.9173 5.8050 5.8326 5.8215 5.8139 
LCG (2) 6.1504 6.1640 5.8370 5.8219 5.8265 5.8198 
LCG (3) 6.1469 5.7811 5.8125 5.8015 5.8142 5.8197 


“Hammersley (1) | 32.7510 26.5326 21.5500 16.1155 9.0914 5.8199 
Hammersley (2) | 32.9082 26.4629 21.5465 16.1149 9.0914 5.8199 


“Halton (1) | 8.6256 6.1205 5.8493 5.8228 5.8209 5.8208 — 
Halton (2) 10.6415 6.0526 5.8544 5.8246 5.8208 5.8207 
Halton (3) 8.5292 6.0575 5.8474 5.8235 5.8212 5.8208 
“Sobol | 5.7181 5.7598 5.8163 5.8190 5.8198 5.8198 © 
Faure 5.7256 5.7718 5.8157 5.8192 5.8197 5.8198 
Halton sequence 9 = {ọn} is then defined by on = (Op, (”),.-., Opa (m)) where pi,...,pa 


are integers that are greater than one and pairwise relatively prime. We represent this 
sequence when d = 2 and ng = n = 1024, and compare it to LCG random variates and 
Hammersley and Faure sequences in Figure 13.54. The underlying idea of QMC is to add 
the new points not randomly, but between the existing points. For example, we have added 
256 points in the Sobol sequence®’ in each panel of Figure 13.55. Finally, we report the 
projection of different low discrepancy sequences on the unit sphere in Figure 13.56. We 
notice that we can generate some ‘hole area’. 


Example 160 We consider a spread option whose payoff is equal to (S1 (T) — S2 (T) — K)”. 
The price is calculated using the Black-Scholes model, and the following parameters: 


57The new points correspond to a square symbol. 
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Sı (0) = S2 (0) = 100, cı = o2 = 20%, p = 50% and r = 5%. The maturity T of the 
option is set to one year, whereas the strike K is equal to 5. We estimate the option price 
with several QMC methods and different number of simulations ng. Results are given in 
Table 13.8. We consider three seed values for the LCG pseudorandom sequences. In the 
case of Hammersley sequences, we use pı = 2 (sequence 1) and pı = 7 (sequence 2). We 
also use three Halton sequences based on the following values of (pı, p2): (2,3), (17,19) and 
(2,19). Finally, we consider Sobol and Faure sequences when the dimension is equal to 2. 
For simulating Gaussian random variates, we use the Box-Muller method. The true price 
of the spread option is equal to 5.8198. We notice that only the Sobol and Faure generators 
have converged to this price when the number of simulations is equal to one million. 


13.4 Exercises 
13.4.1 Simulating random numbers using the inversion method 
1. Propose an algorithm to simulate random variates for the following distribution func- 
tions: 
(a) the generalized extreme value distribution GEV (u, 0, £); 
(b) the log-normal distribution LN (y, o°); 
(c) the log-logistic distribution LL (a, 8). 


2. When we model operational risk losses, we are interested in the conditional random 
variable L = X | X > H where H is a given threshold. 


How can we simulate L if we consider a random number generator of X? 


Let Fy be the distribution function of X. Give the conditional distribution Fy. 


(a) 
(b) 
(c) Find the inverse function F7" and propose an algorithm to simulate L. 
(d) Compare the two algorithms in terms of efficiency. 

) 


(e) Apply algorithms (a) and (c) to the log-normal distribution LN (p, o°). We as- 
sume that u = 7, o = 2.3 and H = $50000. Simulate 100 random numbers”® and 
draw the scatterplot between the uniform random numbers u; and the random 
variates Li. 


3. We consider the extreme order statistics X1.. = min (X1,..., Xn) and Xn:n = 
max (X1,...,Xn). 
(a) How can we simulate Xj., and Xn:n if we have a random generator for X;? 


(b) Calculate the distribution functions F1.n and Fn:n. Deduce an efficient algorithm 
to simulate Xy., and Xy.n- 


(c) Using the previous algorithm, simulate 1000 random numbers of X1:50 and X50:50 
when X; ~ N (0,1). 


58We can use the Lewis-Goodman-Miller generator with a seed equal to 123 456. 
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13.4.2 Simulating random numbers using the transformation method 
1. We consider the random variable X, whose probability density function is given by: 


B%r—*-1e-B/* 


Calculate the density function of Y = 1/X. Deduce that X follows the inverse-gamma 
distribution ZG (a, 8). Find an algorithm to simulate X. 


2. Let X ~ G (a, 8) be a gamma distributed random variable. 


(a) Show that the case œ = 1 corresponds to the exponential distribution with pa- 
rameter A = (3: 
G (1,8) = E (8) 
Deduce an algorithm to simulate G (1, 8). 


(b) When qa is an integer and is equal to n, show that X ~ G (n, 8) is the sum 
of n independent exponential random variables with parameter 6. Deduce an 
algorithm to simulate G (n, 8). 


3. Let X ~ B (a, 8) be a beta distributed random variable. 


(a) We note Y ~ G(a,d) and Z ~ G (8,8) two independent gamma distributed 
random variable. Show that®: 


(b) Deduce an algorithm to simulate X. 


4. The polar method considers the random vector (X,Y) defined by: 


X = R-cosO 
Y =R-cosO 


where R and O are two independent random variables. We assume that R ~ Fp and 
($) ae: U [0,27] . 


(a) Show that the joint density of (X,Y) is equal to: 


fr (v z? + 7) 
x,y) = —— 
fxy ( y) n/a? + y2 
Deduce the expression of the density function fx (x) of X. 
(b) We assume that R = /2- E where E ~ £ (1). 


i. Show that fr (r) =re7”’/2. 
ii. Calculate fx (£x). 
iii. Deduce the Box-Muller algorithm to simulate normal distributed random 
variables. 


59Hint: consider the change of variables X = Y/ (Y + Z) and S = Y + Z, and calculate the joint density 
function fx,s. 
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(c) Bailey (1994) proposed to simulate the Student’s t, distribution with the polar 
method. For that, he considered the distribution: 


r2 —v/2 
Fg (r) =1- (+5) 
V 


where r > 0 and v > 0. 


i. Calculate fr (r). 
ii. Show that: 


1 r2 4 y2 —v/2—1 
fx y(z,y)= on (1 ot ) 
T V 


iii. Find the expression®® of fx (x). Deduce that X is a Student’s t random 
variable with v degrees of freedom. 

iv. Find an algorithm to simulate R. 

v. Deduce an algorithm to simulate Student’s t, random variables. 

vi. What is the main difference with the Box-Muller algorithm? 


13.4.3 Simulating random numbers using rejection sampling 
1. We consider the beta distributed random variable X ~ B(a,(). We assume that 
a>land B>1. 


(a) We use the proposal density function g(a) = 1. Calculate the function h (x) 
defined as follows: 


Show that h (x) achieves its maximum at the point: 


z a-l 


~ a+p—2 
Deduce the value of c that maximizes the acceptance ratio. 


(b) Plot the functions f (x) and cg (x) for the following parameters (a, 8): (1.5, 1.5), 
(3,2), (1,8) and (5,7). Comment on these results. 


(c) Propose an algorithm to simulate B (a, 8) when a > 1 and 6 > 1. 


2. We consider the beta distributed random variable X ~ B(a,). We assume that 
a<land (6 >1. 


(a) We use the proposal density function g(a) = ax°~!. Find the value of c that 
maximizes the acceptance ratio. 
(b) Give an algorithm to simulate the random variable X ~ G. 


(c) Give an algorithm to simulate B (a, 8) when a < 1 and 6 > 1. 


3. We consider the standard Gaussian random variable X ~ N (0,1). 


§& X=] 
60Hint: consider the change of variable u = (1 + es) 
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(a) We use the Laplace distribution as the proposal distribution: 


1 
g(x) = 5c 


Calculate G (x) and G~! (x). Give an algorithm to simulate the Laplace distri- 
bution. 


(b) Find the value of c that maximizes the acceptance ratio. Draw the functions f (x) 
and cg (x). 


(c) Deduce the acceptance-rejection algorithm. 
4. We consider the standard gamma random variable X ~ G (a) where a > 1. 


(a) We use the Cauchy distribution as the proposal distribution: 


1 
We) = m (1+ 27) 
Show that: f(a) . 
T T xet! =r 
sa Ta 


Find the value of c that maximizes the acceptance ratio. 


eae 
= 


We use the Student’s t distribution with 2 degrees of freedom as the proposal 
distribution. Calculate the analytical expression of G (x). Deduce an algorithm 
to simulate X. 


In the case (b), Devroye (1986) showed that: 


f (a) < cg (2) 


— 
CY 
SL 


where: 


What is the most efficient method between algorithms (a) and (b)? 


5. We consider a discrete random variable X with a finite number of states x, where 
k=1,...,K. We note p(k) = Pr {X = zp} its probability mass function. 


(a) We consider the following proposal distribution: 


k) = = 
alk) = = 
Find the value of c that maximizes the acceptance ratio. 


(b) We consider the following distribution: 


a 0 15 33 56 89 
p(k) 10% 20% 40% 20% 10% 


Simulate 1000 random numbers using the acceptance-rejection algorithm and 
draw the histogram of accepted and rejected values. Comment on these results. 
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13.4.4 Simulation of Archimedean copulas 
We recall that an Archimedean copula has the following expression: 
C (ui, u2) = * (yp (u1) + y (u2)) 
where y is the generator function. 
1. Retrieve the Genest-MacKay algorithm to simulate Archimedean copulas. 
2. We assume that ọ (u) = (— lnu)’ with 0 > 1. Find the corresponding copula. 


3. Calculate the conditional distribution C |; associated to the previous Archimedean 
copula. Deduce an algorithm to simulate it. 


4. We consider the Frank copula defined as follows: 


C (u1, U2) = Sby (: 4 (em n (e782 a i 2) 


0 


where 0 € R. Calculate the conditional distribution C2); and deduce an algorithm to 
simulate this copula. 


5. The Ali-Mikhail-Haq family of copulas is given by: 


U1 U2 


C (u1, u2) m 


where @ lies in [—1, 1]. Verify that the generator of this family is: 


a) 


u 


(u) =In( 


6. Simulate 5 random vectors of the Gumbel-Hougaard (0 = 1.8), Frank (0 = 2.1) and 
Ali-Mikhail-Haq copulas (6 = 0.6) by using the following uniform random variates: 


vı 0.117 0.607 0.168 0.986 0.765 
v2 0.498 0.400 0.269 0.892 0.109 


13.4.5 Simulation of conditional random variables 


Let Z ~ N (uz, Uz,z) be a Gaussian random vector of dimension n,. We consider the 
partition Z = (X,Y) where ng + ny = nz, Hz = (Ma; Hy) and: 


x x 
Si = Lyx x,y ) 
( Syw Yyy 
1. Let T be the random vector Y given that X = x*. Give the distribution of T. Deduce 
an algorithm to simulate T. 
2. We consider the random vector T defined by: 


T =Y —LYyeUz_ (X —2*) 


Show that T = T. Deduce an algorithm to simulate T. 
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3. How can we simulate the Gaussian random vector Z without using the Cholesky 
decomposition? 


4. We assume that the vector of means is (1,2,3), the vector of standard deviations is 
(1,0.5,5) and the correlation matrix is: 


1.00 
C= | 0.50 1.00 
0.20 0.30 1.00 


Apply the algorithm described in Question 3 by using the following independent Gaus- 
sian random variates M (0,1): 


uy —1.562 —0.563 —0.573 —0.596 0.984 
uz 0.817 0.845 0.872 —1.303 —0.433 
uz —0.670 0.126 0.884 —0.918 —0.052 


13.4.6 Simulation of the bivariate Normal copula 


Let X = (X,,X2) be a standard Gaussian vector with correlation p. We note U; = 
® (Xi) and U2 = (Xə). 


1. We note ÈX the matrix defined as follows: 


s(t) 


Calculate the Cholesky decomposition of ©. Deduce an algorithm to simulate X. 


2. Show that the copula of (X1, X2) is the same that the copula of the random vector 
(U1, U2). 


3. Deduce an algorithm to simulate the Normal copula with parameter p. 


4. Calculate the conditional distribution of Xə knowing that Xı = x. Then show that: 


nesep f (22) s0 a 


5. Deduce an expression of the Normal copula. 


6. Calculate the conditional copula function C2j1. Deduce an algorithm to simulate the 
Normal copula with parameter p. 


7. Show that this algorithm is equivalent to the Cholesky algorithm found in Question 
3. 


13.4.7 Computing the capital charge for operational risk 


We assume that the mapping matrix contains two cells. For each cell, the aggregate loss 


Sy, is defined as: 
Nz 
Sk =X Xn 
i=1 
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where Nk ~ P (Ax) is the number of losses and Xg; ~ LN (ur, oz) are the individual losses. 
The total loss for the bank is then equal: 


L= Si+ S2 


We calculate the capital-at-risk CaR (a) for different confidence levels: 90%, 95%, 99% and 
99.9%. For that, we use one million simulations. 


1. 


We consider the first cell k = 1 and we assume that A; = 50, 1 = 7 and gı = 1.5. 
Using 100 replications, calculate the mean and standard deviation of the estimator 
CaR, (a). Do you think that one million simulations is sufficient? 


. Same question for the second cell k = 2 if we assume that A» = 100, u2 = 5.5 and 


02> 1.8. 


. Represent the probability density function of In L when the aggregate losses Sı and 


S2 are independent and perfectly dependent. Calculate the diversification ratio when 
we assume that Sı and So are independent. 


. We assume that the dependence function C (S1, S2) is a Normal copula with parameter 


p. Calculate the capital-at-risk of the bank for the following values of p: 0%, 10%, 20%, 
30%, 40%, 50%, 60%, 70%, 80%, 90% and 100%. Compare these estimates with those 
obtained with a Gaussian approximation. 


. Same question if the dependence function between S1 and SQ is a t4 copula. 
. Same question if the dependence function between S1 and Sə is a tı copula. 


. Comment on these results. 


13.4.8 Simulating a Brownian bridge 


We consider a Brownian bridge B (t) such that s < t < u, W (s) = w, and W (u) = wy. 


1. 
2: 
3. 


Find the distribution of the random vector (W (s), W (t), W (u)). 
Calculate the conditional distribution of W (t) given that W (s) = ws and W (u) = wu. 


Deduce an algorithm to simulate B (t). 


13.4.9 Optimal importance sampling 
We consider the estimation of the probability p = Pr{X > c} when X ~ N (0,1). 


1. 


We note pyc the MC estimator of p for one simulation. Calculate E [fmc] and 
var (mc). What is the probability distribution of pyc? 


. Let N (u, o?) be the importance sampling distribution. Give the expression of the IS 


estimator pig for one simulation. Calculate E [Prs] and var (Prs). What do you notice 
about the probability distribution of frs? 


. We assume that c = 3. Calculate var (fmc). Draw the relationship between p and 


var (pig) when ø is respectively equal to 0.8, 1, 2 and 3. Find the optimal value of p. 
What hypothesis can we make? 


. We assume that ø is equal to 1. Find the first-order condition if we would like to 


select the optimal important sampling scheme. Draw the relationship between c and 
the optimal value of u. Deduce an heuristic approach for defining a good IS scheme. 


Taylor & Francis 
Taylor & Francis Group 
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Chapter 14 


Stress Testing and Scenario Analysis 


In 1996, the Basel Committee proposed that banks regularly conduct stress testing programs 
in the case of market risk. The underlying idea was to identify events that could generate 
exceptional losses and understand the vulnerability of a bank. The use of stress tests has 
been increasing with the implementation of the Basel II Accord. Indeed, stress testing is the 
core of the Pillar 2 supervision, in particular for credit risk. At the same time, stress testing 
programs have been extended to the financial sector taken as a whole. In this case, they do 
not concern a given financial institution, but a set of banks or institutions. For example, 
the financial sector assessment program (FSAP) conducted by the International Monetary 
Fund and the World Bank measures the resilience of the financial sector of a given country 
or region. In Europe, EBA and ECB are in charge of the EU-wide stress testing. Since the 
2008 Global Financial Crisis, they have conducted six stress testing surveys. In the US, the 
Fed performs every year a stress testing program that concerns the largest 30 banks. This 
annual assessment includes two related programs: The ‘Comprehensive Capital Analysis and 
Review’ (CCAR) and the ‘Dodd-Frank Act stress testing’ (DFAST). The objective of this 
last program is to evaluate the impact of stressful economic and financial market conditions 
on the bank capital. Recently, the Basel Committee on Banking Supervision has published 
a consultative document on stress testing principles. It highlights the growing importance 
of stress testing in the banking supervision model: 


“Stress testing is now a critical element of risk management for banks and a core 
tool for banking supervisors and macroprudential authorities” (BCBS, 2017a, 


page 5). 


During long times, stress testing mainly concerned market risk, and later credit risk. 
These last years, it has been extended to other risks: funding risk, liquidity risk, and spillover 
risk (Tarullo, 2016). Moreover, stress testing is now extended to other financial sectors such 
as insurance and asset management. For instance, the Financial Stability Board (2017) 
encourages national financial regulators to conduct system-wide stress testing of asset man- 
agers, in particular for measuring the liquidity risk'. These views are also supported by 
IMF (Bouveret, 2017) and some national regulators (AMF, 2017; BaFin, 2017). The coun- 
terparty credit risk is another topic where stress testing could help. This explains that 
ESMA and CFTC have conducted specific stress testing of central counterparty clearing 
houses. We could then expect that the use of stress testing programs will increase across 
financial industries in the coming years, not only at the level of financial institutions, but 
also for regulatory purposes. 


1“ Although such system-wide stress testing exercises are still in an exploratory stage, over time they may 
provide useful insights that could help inform both regulatory actions and funds’ liquidity risk management 
practices” (FSB, 2017, page 23). 
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14.1 Stress test framework 
14.1.1 Definition 
14.1.1.1 General objective 


There are several definitions of stress testing, because stress tests can be used for different 
objectives. Lopez (2005) describes stress testing as “a risk-management tool used to evaluate 
the potential impact on portfolio values of unlikely, although plausible, events or movements 
in a set of financial variables”. In this case, stress testing is a complementary tool for VaR 
analysis. Jorion (2007) considers that stress testing encompasses scenario analysis and the 
impact of stressed model parameters. Scenario analysis consists in measuring the potential 
loss due to a given economic or financial stress scenario. For example, the bank could 
evaluate the impact on its balance sheet if the world GDP decreases by 5% in the next 
two years. Stress testing of model parameters consists in evaluating the impact of stressed 
parameters on the P&L or the balance sheet of the bank. For example, the bank could 
evaluate the impact of more severe LGD parameters on its risk-weighted assets or the 
impact of higher correlations between banks on its CVA P&L. In the case of market risk, 
we can use stressed covariance matrices. In this context, stress testing can be viewed as 
an extension of the historical value-at-risk (Kupiec, 1998). More generally, stress testing 
aims to provide a forward-looking assessment of losses that would be suffered under adverse 
economic and financial conditions (BCBS, 2017a). In the case of a trading book, we recall 
that the loss of Portfolio w is equal to: 


L, (w) = P; (w) — g (Figs, - -, Fms; w) 


where g is the pricing function and (Fi,s,...,Fm,s) is the value of risk factors for the 
scenario s. When considering the historical value-at-risk, we calculate the quantile of the 
P&L obtained for ng historical scenarios of risk factors (s = 1,..., ng). When considering 
the stress testing, we evaluate the portfolio loss for only one scenario: 


Lstress (w) — P; (w) =g (Fi stress; e.. sF rm atres, w) 


However, this scenario represented by the risk factors (F1 stress, -- - , Fm,stress) is supposed to 
be severe. Contrary to the value-at-risk, stress testing is then not built from a probability 
distribution. 


14.1.1.2 Scenario design and risk factors 


In the previous section, we feel that the stress scenario S is given by the set of risk 
factors Fetress = (Fistress;---;m,stress)- It is only the case when we consider a historical 
scenario, e.g. the stock market crash in 1987 or the bond market crash in 1994. This type 
of approach is related to the concept of market price-based stress test, when the stress 
scenario is entirely defined by a set of market prices, for example the level of the VIX 
index, the return of the S&P 500 index, etc. However, most of the time, the scenario S is 
defined by a set (Si,...,Sg) of q stress factors, which are not necessarily the market risk 
factors of the pricing function. This is particularly true when we consider hypothetical and 
macroeconomic stress tests. The difficulty is then to deduce the value of risk factors from 
the stress scenario: 


S = (Sy, eas , Sq) > F atres = (Fistres; agy iF sires) 
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Let us consider the FSAP stress scenarios used for the assessment of the stability of the 
French banking system (De Bandt and Oung, 2004). They tested 13 stress scenarios: 9 
single- and multi-factor shocks (Fi — Fy) and 4 macroeconomic shocks (Mı — M4). We 
report here the Fi, Fs and Fy shocks: 


F flattening of the yield curve due to an increase in interest rates: increase of 150 basis 
points (bp) in overnight rates, increase of 50 bp in 10-year rates, with interpolation 
for intermediate maturities; 


F; share price decline of 30% in all stock markets; 


Fy flattening of the yield curve (increase of 150 basis points in overnight rates, increase 
of 50 bp in 10-year rates) together with a 30% drop in stock markets. 


We denote by Sı and Sə the stress factors defined by the single-factor shocks F; and F5. 
We have: 


Fı : Sı = (Fi stress; Sas E mstress) 
Fs : S2 > (Fi stress; sass iF ates) 
Fy : (Si, S2) > (Fi, stress; iKa Emate) 


We notice that Fy corresponds to the simultaneous shocks F; and F;. It is obvious that 
the three shocks will impact differently the market risk factors (Fi stress, ---;7m,stress): 
However, the transformation of the stress S into Fgtress is complex and depends on the 
modeling process of the financial institution. For instance, we can imagine that most of 
models will associate to the scenario S; a negative impact on stock markets. For Bank A, it 
could be a 10% drop in stock markets while the model of Bank B may imply a share price 
decline of 20% in stock markets. It follows that stress testing is highly model-dependent. 
Let us now consider the Mz macroeconomic shock: 


Mp increase to USD 40 in the price per barrel of Brent crude for two years (an increase 
of 48% compared with USD 27 per barrel in the baseline case), without any reaction 
from the central bank; the increase in the price of oil leads to an increase in the general 
rate of inflation and a decline in economic activity in France together with a drop in 
global demand. 


Again, the stress factor S3 can produce different outcomes in terms of market risk factors 
depending on the model: 


Mo : S3 = (F stress; cee , Fm,stress) 


Therefore, stress testing models are more sensitive to value-at-risk models. This is the 
main drawback of this approach. For instance, if we want to compare two banks, it is 
important to describe more precisely the stress scenarios than the shocks above. Moreover, 


having the stressed market risk factors of the two banks FA... = (ee pects odes eee 
and Fess = (E sires As E s) is also relevant to understand how the initial shock 


spreads through the financial system, and the underlying assumptions of the models of 
banks A and B. The sensitivity to models and assumptions is even more pronounced in 
the case of liquidity stress tests. Indeed, the model must take into account spillover effects 
between financial institutions. In the case of funding liquidity, it requires modeling the 
network between banks, but also the monetary policy reaction function. In the case of 
market liquidity, the losses will depend on the behavior of all market participants, including 
asset managers and investors. 
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The previous introduction shows that we can classify stress scenarios into 4 main cate- 
gories: 


1. historical scenario: “a stress test scenario that aims at replicating the changes in risk 
factor shocks that took place in an actual past episode?” (BCBS, 2017a, page 60); 


2. hypothetical scenario: “a stress test scenario consisting of a hypothetical set of risk 
factor changes, which does not aim to replicate a historical episode of distress” (BCBS, 
2017a, page 60); 


3. macroeconomic scenario: “a stress test that implements a link between stressed 
macroeconomic factors [...] and the financial sustainability of either a single finan- 
cial institution or the entire financial system” (BCBS, 2017a, page 61); 


4. liquidity scenario: “a liquidity stress test is the process of assessing the impact of an 
adverse scenario on institution’s cash flows as well as on the availability of funding 
sources, and on market prices of liquid assets” (BCBS, 2017a, page 60). 


Concerning hypothetical stress tests, we can also make the distinction between three types of 
scenarios: baseline, adverse and severely adverse. Since a baseline scenario corresponds to the 
best forecast of future economic conditions, it is not necessarily a stress scenario but serves as 
a benchmark. An adverse scenario is a scenario, where the economic conditions are assumed 
to be worse than for the baseline scenario. The distinction between an adverse and a severely 
adverse scenario is the probability of occurrence, which is very low for this latter. Therefore, 
we notice that defining a stress scenario is a two-step process. We first have to select the 
types of shocks, and then we have to calibrate the severity of the scenario. In Figures 14.1 
and 14.2, we have reported the three scenarios of the 2017 Dodd-Frank Act stress test 
exercises? that were developed by the Board of Governors of the Federal Reserve System 
(2017). The baseline scenario for the United States is a moderate economic expansion, while 
the US economy experiences a moderate recession in the adverse scenario. The severely 
adverse scenario is characterized by a severe global recession that is accompanied by a 
period of heightened stress in corporate loan markets and commercial real estate markets. 
The baseline, adverse and severely adverse scenarios use the same set of stress factors, but 
the magnitude of the shocks are different. 


14.1.1.3 Firm-specific versus supervisory stress testing 


In the 1990s, stress tests were mainly conducted by banks in order to understand their 
hidden vulnerabilities: 


“The art of stress testing should give the institution a deeper understanding of 
the specific portfolios that could be put in jeopardy given a certain situation. 
The question then would be: Would this be enough to bring down the firm? 
That way, each institution can know exactly what scenario they do not want to 
engage in” (Dunbar and Irving, 1998). 


More precisely, stress testing first emerged in trading activities. This explains that stress 
testing was presented by the 1996 amendment to the capital accord as an additional tool to 
the value-at-risk. It was an extreme risk measure, a tool for risk management, a requirement 
in order to validate internal models, but it was not used for calculating the regulatory 


2 According to BCBS (2017a), it may also result from “a combination of changes in risk factor shocks 
observed during different past episodes”. 

3The data are available at the following website: www.federalreserve.gov/supervisionreg/dfast-ar 
chive.htm. 
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FIGURE 14.1: 2017 DFAST supervisory scenarios: Domestic variables 
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FIGURE 14.2: 2017 DFAST supervisory scenarios: International variables 
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capital. The Basel 2.5 framework has changed this situation since the capital depends on the 
stressed value-at-risk. In trading activities, stress scenarios are mainly historical. Besides the 
vulnerability analysis, stress testing have also been extensively used in the setting of trading 
limits. In the case of derivatives portfolios, trading limits are defined using sensitivities or 
VaR. metrics. However, some situations can lead the bank to determine hard trading limits 
based on stress testing: 


e some trading portfolios are sensitive to parameters that are unobservable or unstable; 
for example, a basket option depends on correlations, that can change faster in a crisis 
period; 


e some underlying assets may become less liquid in a period of stress, for example 
volatility indices, dividends futures, small cap stocks, high yield bonds, etc. 


In these cases, stress exposure limits are better than delta or vega exposure limits, because 
it is difficult to manage portfolios in non-normal situations. In the 2000s, the Basel II Accord 
has encouraged banks to apply stress testing techniques to credit risk, and some operational 
risk events such as rogue trading. However, firm-wide stress testing has made little progress 
before the development of supervisory stress tests (CGFS, 2001, 2005). 


Supervisory stress tests starts in 1996 with the amendment to the Basel I Accord. How- 
ever, they mainly concerned micro-prudential analysis. It was also the case with the Basel 
II Accord. The development of stress testing for macro-prudential purposes really begins 
to take off after the Global Financial Crisis. Before 2008, only the financial sector assess- 
ment program (FSAP), which was launched by the International Monetary Fund and the 
World Bank, can be considered as a system-wide stress testing exercise. Since the GFC, 
supervisory stress tests has become a standard for the different policymakers: 


“Regulatory stress tests moved from being small-scale, isolated exercises within 
the broader risk assessment programme, to large-scale, comprehensive risk- 
assessment programmes in their own right leading directly to policy responses” 
(Dent et al., 2016, page 133). 


Most of the time, policymakers and supervisors develop concurrent stress tests, meaning 
that the stress tests are applied to all banks of the system. Generally, these concurrent stress 
tests are used for setting capital buffers of banks. In this case, it is important to distinguish 
stress tests under the constant or dynamic balance sheet assumption (Busch et al., 2017). 
Below, we review three supervisory stress testing frameworks: 


e Financial sector assessment program (FSAP)* 

The FSAP exercise is conducted by the IMF and the World Bank. It is an in-depth 
assessment of a country’s financial sector. According to the IMF, “FSAPs analyze the 
resilience of the financial sector, the quality of the regulatory and supervisory frame- 
work, and the capacity to manage and resolve financial crises. Based on its findings, 
FSAPs produce recommendations of a micro- and macro-prudential nature, tailored 
to country-specific circumstances”. For instance, FSAPs have been tested for the fol- 
lowing countries in 2017: Bulgaria, China, Finland, India, Indonesia, Japan, Lebanon, 
Luxembourg, Netherlands, New Zealand, Saudi Arabia, Spain, Sweden, Turkey and 
Zambia. Generally, the FSAP exercise includes one or two stress scenarios. 


4The FSAP website is www.imf.org/external/np/fsap/fsap.aspx. 
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e Dodd-Frank Act stress test (DFAST)° 
According to the Fed, DFAST is a “forward-looking quantitative evaluation of the 
impact of stressful economic and financial market conditions on bank holding compa- 
nies’ capital”. The results of DFAST are incorporated into the comprehensive capital 
analysis and review (CCAR), which evaluates the vulnerability of each bank on an 
annual basis. The DFAST exercise includes three types of scenario: baseline, adverse 
and severely adverse. 


e EU-wide stress testing® 
EU-wide stress tests are conducted by the European Banking Authority (EBA), the 
European Systemic Risk Board (ESRB), the European Central Bank (ECB) and the 
European Commission (EC) in a regular basis, generally every two years’. According 
to the EBA, the aim of such tests is to “assess the resilience of financial institutions 
to adverse market developments, as well as to contribute to the overall assessment of 
systemic risk in the EU financial system”. 


Supervisory stress tests are not limited to these three examples, since most of developed 
central banks also use stress testing approaches, for instance the Bank of England (www.ba 
nkofengland.co.uk/stress-testing) or the Bank of Japan (www.boj.or.jp/en/resea 
rch/brp/fsr/index.htm). 


14.1.2 Methodologies 


There are three main approaches for building stress scenarios. The historical approach 
can be viewed as an extension of the historical value-at-risk. The macroeconomic approach 
consists in developing hypothetical scenarios based on a macro-econometric model. Hypo- 
thetical scenarios can also be generated by the probabilistic approach. In this case, the 
probability distribution of risk factors is estimated and extreme scenarios are computed 
analytically or by Monte Carlo simulations. 


14.1.2.1 Historical approach 


This approach is the first method that have been used by banks in the early 1990s. It 
consists in identifying the worst period for a given risk factor. For instance, a stress scenario 
for equity markets may be the one that occurred during the Black Monday (1987) or the 
collapse of Lehman Brothers (2008). A typical adverse scenario for sovereign bonds is the 
US interest rate shock in 1994, also known as the ‘great bond massacre’. For currencies and 
commodities, historical stress scenarios can be calibrated using the Mexican peso crisis in 
1994, the Asian crisis in 1997, or the commodity price crash in 2015. This approach is very 
simple and objective since it is based on past values of risk factors. However, it has two 
main drawbacks. First, the past worst scenario is not necessarily a good estimate of a future 
stress scenario. A typical example is the subprime crisis. Second, it is difficult to compare 
the severity of different historical stress scenarios. 

The loss (or drawdown) function is defined by £(h) = min; R(t;h) where R(t;h) is 
the asset return for the period [t,t + h]. In Table 14.1, we have reported the 5 maximum 
values of £ (h) for the S&P 500 index and different values of h. For instance, the maximum 
of the daily drawdown is reached on 19 October 1987, where we observe a daily return of 
—20.47%. On 15 October 2008, a loss of —9% is observed. If we consider a monthly period, 
the maximum loss is about 30%. In Figure 14.3, we have reported the drawdown function 


>The DFAST website is www. federalreserve.gov/supervisionreg/dfa-stress-tests.htm. 
6 The corresponding website is www. eba. europa. eu/risk-analysis-and-data/eu-wide-stress-testing. 
TThey took place in 2009, 2010, 2011, 2014, 2016 and 2018. 
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TABLE 14.1: Worst historical scenarios of the S&P 500 index 


Sc. 1D 1W 1M 

1 | 1987-10-19 —20.47 | 1987-10-19 —27.33 | 2008-10-27 —30.02 
2 | 2008-10-15 —9.03 | 2008-10-09 —18.34 | 1987-10-26 —28.89 
3 | 2008-12-01 —8.93 | 2008-11-20 —17.43 | 2009-03-09 —22.11 
4 
5 


2008-09-29 —8.79 | 2008-10-27 —13.85 | 2002-07-23 —19.65 
1987-10-26 —8.28 | 2011-08-08 —13.01 | 2001-09-21 —16.89 
Sc. 2M 3M 6M 

2008-11-20 —37.66 | 2008-11-20 —41.11 | 2009-03-09 —46.64 
1987-10-26 —31.95 | 1987-11-30 —30.17 | 1974-09-13 —34.33 
2002-07-23 —27.29 | 1974-09-13 —28.59 | 2002-10-09 —31.29 
2009-03-06 —26.89 | 2002-07-23 —27.55 | 1962-06-27 —26.59 
1962-06-22 —23.05 | 2009-03-09 —25.63 | 1970-05-26 —25.45 


oRWNHFH 


L (h). We notice that the drawdown increases with the time period at the beginning, but 
decreases when the time period is sufficiently long. The maximum loss is called the maximum 
drawdown: 


MDD = min L (At) 
At 


In the case of the S&P 500 index, the maximum drawdown is equal to —56.8% and has 
been observed between 9 October 2007 and 9 March 2009. 


Remark 173 In practice, the maximum drawdown is calculated using this formula: 


MDD = — max (m Sia *) 
t 


MaxX{o,¢] P, 
where P, is the asset price or the risk factor. 


The choice of the lag window h is important. Indeed, defining a stress scenario of —30% 
for US stocks is not the same if the time period is one day, one week or one month. An- 
other important factor is the time period. For instance, a 50% drawdown for US stocks is 
observed many times in the last 50 years. However, it is not the same thing to consider 
the subprime crisis, the dot.com crisis or the 1973-1974 crisis of the stock market. Even if 
these three historical periods experience similar losses for stocks, the fixed income market 
reacts differently. It is then obvious that defining a stress scenario cannot be reduced to a 
single number for one risk factor. It is also important to define how the other risk factors 
will react and be impacted. 


14.1.2.2 Macroeconomic approach 


The macroeconomic approach consists in developing a macroeconometric model and 
considering an exogenous shock in order to generate adverse stress scenarios. The advan- 
tages of this approach are manifold. First, the macroeconomic model takes into account 
the current economic environment. Stress scenarios are then seen as more plausible than 
using the historical approach. For example, the drawdowns observed in the stock market in 
1974, 2000 and 2008 are comparable in terms of magnitude, but not in terms of economic 
conditions. The origin of a financial market crisis is different each time. This is true for the 
stock market, but also for the other asset classes. Macroeconomic modeling may then help 
to develop the relationships between risk factors and the interconnectedness between asset 
classes for the next crisis. This is why the macroeconomic approach is certainly not better 
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FIGURE 14.3: Loss function of the S&P 500 index 


than the historical approach for defining single-factor stress testing, but it is more adapted 
for building multi-factor stress testing. Therefore, a second advantage is to describe the 
sequence of the crisis, and the dynamics between risk factors. Another advantage is that 
many scenarios can be generated by the model. For instance, we have previously seen that 
the DFAST program defines three scenarios: baseline, adverse and severely adverse. How- 
ever, it is obvious that more scenarios are generated by the model. At the end, only two 
or three scenarios are selected, because some of them produce unrealistic outcomes, others 
may generate similar results, etc. 


Exogenous 
— Model — 
Shock 


FIGURE 14.4: Macroeconomic approach of stress testing 


However, we must be careful with the macroeconomic approach since it has also weak- 
nesses. Stress testing always contains a side of uncertainty. In the case of the historical 
approach, this is obvious since there is no chance that the next crisis will look like a pre- 
vious crisis. In the case of the macroeconomic approach, we generally expect to predict 
the future crisis, but we certainly expect too much (Borio et al., 2014). In Figure 14.4, we 
have represented the traditional way to describe and think the macroeconomic approach 
of stress testing. The model uses input parameters (exogenous shocks) in order to produce 
output parameters (risk factors). In the real life, the impact of the risk factors on financial 
entities are not direct and deterministic. Indeed, we generally observe feedback effects from 
the stressed entities (E1, ..., Æn) on the economic situation (Figure 14.5). For instance, the 
default of one financial institution may lead monetary authorities to change their interest 
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rate policy. These feedback effects are the most challenging point of the macroeconomic 
stress testing framework. 


FIGURE 14.5: Feedback effects in stress testing models 


A macroeconomic stress testing model is not only a macro-econometric model, that is 
based on a reduced form or a vector autoregressive process (Sims, 1980). Modeling activity 
(GDP, unemployment rate, etc.), interest rates and inflation (3M and 10Y interest rates, 
CPI, etc.) is the first step of the global process. It must also indicate the impact of the 
economic regime on credit risk parameters (default rates, CDS spreads, recovery rates, etc.) 
and fundamental variables (earnings, dividends, etc.). Finally, it must define the shocks 
on financial asset prices (stocks, bonds, commodities, real estate, etc.). For instance, the 
DFAST program defines 16 domestic and 12 international economic variables: 


e Domestic variables: (1) Real GDP growth; (2) Nominal GDP growth; (3) Real dispos- 
able income growth; (4) Nominal disposable income growth; (5) Unemployment rate; 
(6) CPI inflation rate; (7) 3-month Treasury rate; (8) 5-year Treasury yield; (9) 10- 
year Treasury yield; (10) BBB corporate yield; (11) Mortgage rate; (12) Prime rate; 
(13) Dow Jones Total stock market index (Level); (14) House price index (Level); (15) 
Commercial real estate price index (Level); (16) Market volatility index (Level). 


International variables: (1) Euro area real GDP growth; (2) Euro area inflation; (3) 
Euro area bilateral dollar exchange rate (USD/euro); (4) Developing Asia real GDP 
growth; (5) Developing Asia inflation; (6) Developing Asia bilateral dollar exchange 
rate (F/USD, index); (7) Japan real GDP growth; (8) Japan inflation; (9) Japan bi- 
lateral dollar exchange rate (yen/USD); (10) UK real GDP growth; (11) UK inflation; 
(12) UK bilateral dollar exchange rate (USD/pound). 


These variables concern activity, interest rates, inflation but also the prices of financial 
assets: the scenario for equities is given by the level of the Dow Jones index; the slope of the 
yield curve defines the scenario for fixed income instruments; the BBB corporate yield, the 
mortgage rate and the prime rate can be used to shape the scenario for credit products like 
corporate bonds or CDS; the scenario for real estate is given by house and commercial RE 
price indices; the level of the VIX indicates the scenario of the implied volatility for options 
and derivatives; the four exchange rates determine the stress scenario, which is valid for 
currency markets. 
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We notice that the DFAST program defines the major trends of the financial asset 
prices, but not a detailed scenario for each asset class. For equity markets, only the stress 
scenario for US large cap stocks is specified. Using this figure, one has to deduce the stress 
scenario for European equities, Japanese equities, EM equities, small cap equities, etc. So, 
there is room for interpretation. And there is a gap between the stress scenario given by the 
macroeconomic model and the outcome of the stress scenario. Contrary to the historical 
approach, the macroeconomic approach requires translating the big trends into the detailed 
path of risk factors. This step can only be done using parametric models: CAPM or APT 
models for stocks, Nelson-Siegel model for interest rates, Merton model for credit, etc. 


14.1.2.3 Probabilistic approach 


Until now, we have presented the outcome of a stress scenario as an extreme loss. How- 
ever, the term ‘extreme’ has little meaning and is not precise. It is obvious that a GDP 
growth of —10% is more extreme than a GDP growth of —5%. The extreme nature of a 
stress scenario can then be measured by its severity. However, we may wonder if a GDP 
growth of —50% is conceivable for instance. There is then a trade-off between the severity 
of a stress scenario and its probability or likelihood. 


At first approximation, a stress scenario can be seen as an extreme quantile or value- 
at-risk. In this case, the aim of stress testing is not to estimate the maximum loss but 
an extreme loss. For instance, if we consider the univariate stress scenarios F — Fy of De 
Bandt and Oung (2004) presented on page 895, the authors indicate that the corresponding 
frequency is 1% over the last thirty years. In the case of multivariate stress scenarios, the 
probability of Mz is equal to 1% while the probability of Ms is equal to 5%. However, most 
of the time, the occurrence probability of a stress scenario is not discussed. 


Let Y, Xı and Xə be three random variables that we would like to stress. These random 
variables may represent macroeconomic variables, market risk factors or parameters of risk 
models. We note S(Y), S(X,) and S(X_) the corresponding stressed values. Evaluating 
the likelihood of a stress scenario consists in calculating its probability of occurrence. The 
calculation depends on the relationship between the portfolio loss L (w) and the random 
variable to stress. For instance, if the relationship between L (w) and X, is decreasing, the 
probability of the stress S (X 1) is equal to: 


a= Pr {X < S(X1)} = F, (S (X1)) 
If the relationship between L (w) and X2 is increasing, we have: 
ag = Pr {Xo = S (X2)} =1- Fo (S (X2)) 


a, and az measures the probability of univariate stress scenarios S (X1) and S (X2). Simi- 
larly, we may compute the joint probability of the stress scenario (S (X1) ,S (X2)): 
a12 = Pr {X1 < S(X1), X2 > S(X2)} 
= Pr {X, <S(X,)} — Pr{X, < S(X1), X2 < S(X2)} 
= Fi (S(%1)) — C12 (Fi (S (41) , F2 (S (X2))) 
While the univariate stress scenarios depend on the cumulative distribution functions Fy 


and F2, the bivariate stress scenario also depends on the copula function C1,2 between X: 
and X2. If we assume that X; and Xə are independent, we obtain: 


a12 = Aa—ay,: (1 — a2) 


ay: a2 
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If we assume that X; and X> are perfectly dependent — C,2 = Ct, we have’: 

a2 = a, —min(aj,,1— a2) 

0 

This result is perfectly normal because X; and Xə impact L(w) in an opposite way. If 
C12 = C7, we have: 

aı2 = @ı-— max (0,aı — ag) 

= min (a1, a2) 

We deduce that the probability of the bivariate stress scenario is lower than the probability 
of the univariate stress scenarios: 


0 < a12 < min (ay, a2) 


We now consider that the stress scenario S (Y) is deduced from S (X1) and S(X2). The 
conditional probability of the stress scenario S (Y) is then given by: 


a = Pr{Y <S(Y) | (X1, X2) = (S (X1) ,S (X2))} 


It follows that œ depends on the conditional distribution of Y given Xı and X2. These 
three concepts of probability — univariate, joint and conditional — drive the quantitative 
approaches of stress testing that are presented below. They highlight the importance of 
quantifying the likelihood of the stress scenario, that is the probability of outcomes. 


14.2 Quantitative approaches 


The previous breakdown is used to classify the models into three main categories. The 
univariate case generally consists in modeling the probability distribution of a risk factor in 
an extreme situation. It is generally based on the extreme value theory. The multivariate 
case is a generalization of the first approach, and requires specifying the dependence between 
the risk factors. Copula functions are then the right tool for this task. The third approach 
uses more or less complex econometric models, in particular time series models. 


14.2.1 Univariate stress scenarios 


Let X be the random variable that produces the stress scenario S (X). If X follows the 
probability distribution F, we have? Pr{X < S(X)} = F (S (X)). Given a stress scenario 
S(X), we may deduce its severity: 

a = F (S (X)) 
We may also compute the stressed value given the probability of occurrence a: 
S(X) =F l(a) 


Even if this framework is exactly the approach used by the value-at-risk, there is a big 
difference between value-at-risk and stress testing. Indeed, the probability œ used for stress 
testing is much lower than for value-at-risk. 


8We recall that a; 0 and ap & 0. 
°We assume that the relationship between L (w) and X is decreasing. 
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TABLE 14.2: Probability (in %) associated to the return period 7 in years 


Return period 1 5 10 20 30 50 

Daily 0.3846 0.0769 0.0385 0.0192 0.0128 0.0077 
Weekly 1.9231 0.3846 0.1923 0.0962 0.0641 0.0385 
Monthly 8.3333 1.6667 0.8333 0.4167 0.2778 0.1667 
1 — acev 7.6923 1.5385 0.7692 0.3846 0.2564 0.1538 


We recall that the return period 7 is related to the probability œa by the relationship 
T = a71. We deduce that a = T~!. In Table 14.2, we report the probability a for different 
return periods and different frequencies (daily, weekly and monthly). In the case where F is 
the cumulative distribution function of daily returns!°, the probability a is equal to 0.0769% 
when 7 is equal to 5 years, and 0.0128% when 7 is equal to 30 years. There are extreme 
probabilities in comparison to the confidence level a = 1% for the value-at-risk. Therefore, 
we can use the extreme value theory to calculate these quantities. We reiterate that: 


T= aT! =n: (1 = acEv) ' 


11 


where n is the length of the block maxima”. 


TABLE 14.3: GEV parameter estimates (in %) of MSCI USA and MSCI EMU indices 


Long position 


Short position 
MSCI USA MSCI EMU MSCI USA MSCI EMU 


Parameter 
H 1.242 
o 0.720 
E 19.363 


1.572 
0.844 
21.603 


1.317 
0.577 
26.341 


1.599 
0.730 
26.494 


TABLE 14.4: Stress scenarios (in %) of MSCI USA and MSCI EMU indices 


Long position 


Short position 


Year  MSCIUSA MSCIEMU MSCIUSA MSCIEMU 
5 —5.86 7.27 5.69 7.16 
10 —7.06 -8.83 7.01 8.84 
25 —8.92 —11.29 9.17 11.60 
50 —10.56 —13.49 11.18 14.17 
75 —11.62 —14.94 12.54 15.91 
100 —12.43 —16.05 13.59 17.26 

eee —9.51 —10.94 11.04 10.87 

statistic 

T* 32.49 22.24 47.87 20.03 


Let us consider the MSCI USA and MSCI EMU indices from 1990 to 2017. We calculate 
the daily returns R+. Then we take the block maxima (X = R+) and the block minima (X = 
— R+) for modeling short and long exposures. Finally, we estimate the parameters (u, ø, €) by 
the method of maximum likelihood and calculate the corresponding stress scenario S (X) = 


10We assume that there are 260 trading days in one year. 
11For instance, when T is equal to 5 years and n is equal to 20 days, we obtain aggy = 1.5385%. 
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@— MSCI USA (long) 
@®--: MSCI EMU (long) 
0 @— MSCI USA (short) 
@-=-+ MSCI EMU (short) 


Return time (in years) 


FIGURE 14.6: Stress scenarios (in %) of MSCI USA and MSCI EMU indices 


G-!(1—n7~) where G is the estimated GEV distribution. Results are given in Tables 
14.3 and 14.4 and Figure 14.6 when the size of blocks is equal to 20 trading days. We notice 
that the magnitude of stress scenarios is higher for the MSCI EMU index than for the MSCI 
USA index. For each extreme statistic!?, we have reported the associated return period 7*. 
For the MSCI EMU index, 7% is close to 20 years. For the MSCI USA index, we obtain a 
larger return period. This indicates that the stress scenarios for the MSCI USA index may 
be underestimated. Therefore, it may be appropriate to take the same stress scenario for 
the two indices, because the differences are not justified. 


14.2.2 Joint stress scenarios 
14.2.2.1 The bivariate case 


Let Xn:n,ı and Xn:n,2 be the maximum order statistics of the random variables X, 
and X2. We note p = Pr{Xnin > S(X1),Xnm2 > S(X2)} the joint probability of stress 
scenarios (S (X1) ,S (X2)). We have: 


P = 1— Pr {Xn:n,1 < S(X1)} — Pr {Xn:n,2 <S(Xı)} + 
Pr {Xna < S (Xı) ,Xn:n,2 < S (X2)} 
= 1-—F (S(X1)) — Fo (S (X2)) + C (Fi (S (X1)) , Fo (S (X2))) 


= C(F, (S(X1)), Fə (S(X2))) 


where Č (uy, u2) = 1 — u1 — u2 + C (u1, u2). We deduce that the failure area is represented 
by: 


{(S(%1) ,8 (Xa) € RÈ | Č (Fi (S (Xi) F2 (S (X2))) < =) 


12 They correspond to the minimum and maximum of daily returns. 
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Given a return period 7, we don’t have a unique joint stress scenario (S (X1) , S (X2)), but 
an infinite number of bivariate stress scenarios. 
The previous result argues for computing the implied return period of a given scenario, 


and not the opposite: 
T=- n 
C (F1 (S (X1)) , F2 (S (X2))) 


In the univariate case, the implied return period of the stress S (X;) is equal to: 


n 


T= ToS) 


Since an extreme value copula satisfies the property Ct < C < Ct, we deduce that: 
max (71,72) ST <nNTe 


In Table 14.5, we report the upper and lower bounds of 7 for different values of n, Ti and 72 
by assuming that a year contains 260 trading days. We observe that the range of T is wide. 
For instance, when n is equal to 20 days, and 7; and 72 are equal to 5 years, the return 
period of the joint stress scenario is equal to 5 years if the two scenarios are completely 
dependent and 325 years if they are independent. 


TABLE 14.5: Upper and lower bounds of the return time 7 (in years) 


Lower Upper 


n (in days) Ti h bound bound 


1 5 5 5 6500 

5 5 5 5 1300 
20 5 5 5 325 
260 5 5 5 25 
260 10 5 10 50 
260 1 1 1 1 


We consider the previous example with MSCI USA and EMU indices. We have reported 
the failure area in Figure 14.7. For that, we have estimated the copula C by assuming a 
Gumbel copula function: 


C (u1, u2) = exp (- ((- Inu)? + (—In u2)°) ") 


We estimate 0 by the method of maximum likelihood for each quadrant and obtain the 
following results: 


MSCI USA Positive Positive Negative Negative 
MSCIEMU Positive Negative Negative Positive 


0 1.7087 1.4848 1.7430 1.4697 


This means that 6 is equal to 1.7087 if the stress for MSCI USA and EMU indices are 
both positive. We have also reported the solution in the two extremes cases C+ and Ct. 
We observe that the dependence plays a major role when considering joint scenarios. For 
instance, if we consider a scenario of —10% for the MSCI USA index and —10% for the 
MSCI EMU index, the return period is respectively equal to 39.9, 55.1 and 8197 years for 
the product, Gumbel and Fréchet copulas. 
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FIGURE 14.7: Failure area of MSCI USA and MSCI EMU indices (blockwise dependence) 


Remark 174 The previous exercise illustrates the limits of blockwise analysis. Let us con- 
sider the case of a negative stress for the MSCI USA index and a positive stress for the 
MSCI EMU index. When n is equal to 20 days, we calculate for each block the worst daily 
return for the first index and the best daily return for the second index. However, during 4 
weeks, these two extreme returns do not certainly occur the same day. It follows that the 
dependence is overestimated for the two quadrants (Positive, Negative) and (Negative, Pos- 
itive). This is why it is better to estimate the copula function using daily returns and not 
blockwise data. In this case, we obtain the results given in Figure 14.8. 


14.2.2.2 The multivariate case 


In the multivariate case, the failure area is defined by: 
2 n 
{(S(%1),---S(%)) ERE | (F1 S(%)),---. Fm G(X) < ZI 


where: 


C(u, u= A ED SS CW) 
i=0 vEZ(p—i,p) 
and Z (m, p) denotes {v € [0,1]? | X2; 1 {v; = 1} = m}. In the case p = 2, we retrieve the 
tel 
previous expression: z 


C (u1, U2) =1- Uz, — uUa + C (u1, U2) 
When p is equal to 3, we obtain: 


C (u1, u2,u3) = 1-— u -— u2 -— u3 + 
C (u1, u2) + C (u1, ug) + C (u2, u3) — 


C (u1, u2, U3) 
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FIGURE 14.8: Failure area of MSCI USA and MSCI EMU indices (daily dependence) 


Remark 175 Bouyé et al. (2000) used this framework for evaluating stress scenarios as- 
sociated to five commodities of the London Metal Exchange. Since commodity returns are 
not necessarily positively correlated, they showed that collecting univariate stress scenarios 
to form a multivariate stress scenario is completely biased. In particular, they presented an 
example where the return period of univariate stress scenarios is 5 years while the return 
period of the multivariate stress scenario is 50000 years. 


14.2.3 Conditional stress scenarios 


In supervisory stress testing, the goal is to impact the parameters of the risk model 
according to a given scenario. For example, these parameters may be the systematic risk 
factor in market risk factors, the probability of default and the loss given default in credit 
risk modeling, or the frequency of the Poisson distribution and the parameters of the severity 
distribution in operational risk modeling. Therefore, we have to estimate the relationship 
between these parameters and the variables of the scenario, and deduce their stressed values. 


14.2.3.1 The conditional expectation solution 


Let us assume a linear model between the independent variable Y and the explanatory 
variables X = (Xj,..., Xn): 


Yı = Bo + Pee +E: 
i=1 


where ey ~ N (0, o°). By assuming that the standard properties of the linear regression 
model hold, we obtain: 


2 [Y4] = bo + XC BiE[Xi2] 
i1 
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We can also calculate the conditional expectation of Y;: 


[Ys | Xt = (#1,..-,2n)] = bo + X bizi 


i=1 
Given a joint stress scenario S (X) = (S(X1),...,S(X;,)), we deduce the conditional stress 
scenario of Y and we have: 
S(Y) = EY, | X = (8(%),.-.,8(Xn))] 
= Bot >= BS (Xi) 
i=1 


In some cases, assuming a linear relationship is not relevant, in particular for the probability 
of default or the loss given default. It is then common to use the following transformation 


(Dees et al., 2017): 
Y; 
Zt =In č 
1- Y 


We have: 


Y, 


= h(%) 


where h (z) is the logit transformation. We verify that Y; € [0,1]. Since the statistical model 
becomes Z; = Bo + S711 biXi t+ us, we deduce that: 


BLY; | Xz = (21,..-,2n)] = h (50+ Sean ta) “9 (2) dw (14.1) 
=O i=1 


This conditional expectation can be calculated thanks to numerical integration algorithms. 


Remark 176 The previous model can also be extended in order to take into account fixed 
effects (panel data) or lag dynamics. For instance, we can use an ARX(p) model: 


p n 
Yı = Bo + 5 QiYi—i + 5 BiXi t + Ut 


t=1 {=l 


Example 161 We assume that the probability of default PD; at time t is explained by the 
following linear regression model: 


PD 
in (RE) = —2.5 — 5gi — Bip 2u + Et 


where ce; ~ N (0,0.25), g} is the growth rate of the GDP, m, is the inflation rate, and u; is 
the unemployment rate. The baseline scenario is defined by gi = 2%, mı = 2% and uz = 5%. 


In Figure 14.9, we have reported the probability density function of PD; for the baseline 
scenario and the following stress scenario: gg = —8%, m, = 5% and u, = 10%. The con- 
ditional expectation! is respectively equal to 7.90% and 12.36%. The figure of 7.90% can 


13We use a Gauss-Legendre quadrature method with an order of 512 for computing the conditional 
expectation given by Equation (14.1). 
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FIGURE 14.9: Probability density function of PD; 
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FIGURE 14.10: Relationship between the macroeconomic variables and PD; 
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TABLE 14.6: Stress scenario of the probability of default 


t| a mn u [EPD |S(X) ax SA) 
0 2.00 2.00 5.00 7.90 12.78 
-1| —6.00 2.00 6.001 — 1145 1826 
2 | —7.00 1.00 7.00 12.47 19.79 
3| —9.00 1.00 9.00 14.03 22.14 
4 | —7.00 1.00 10.00 13.12 20.78 
5 | —7.00 2.00 11.00 13.01 20.59 
6 | —6.00 2.00 10.00 12.26 19.49 
7| —4.00 4.00 9.00 10.49 16.80 
8 | —2.00 3.00 8.00 9.70 15.58 
9| —1.00 3.00 7.00 9.11 14.68 
10 2.00 3.00 6.00 7.82 12.68 
11 4.00 3.00 6.00 7.14 11.60 
12 4.00 3.00 6.00 7.14 11.60 


be interpreted as the long-run (or unconditional) probability of default that is used in the 
IRB formula. The relationship between the macroeconomic variables and the conditional 
expectation of PD; is shown in Figure 14.10. For each panel, we consider the baseline sce- 
nario and we vary one parameter each time. In Table 14.6, we consider a stress scenario 
for the next 3 years, and we indicate the values taken by gt, 7; and us for each quarter 
t. Then, we calculate the conditional expectation and the conditional quantile at the 90% 
confidence level of the probability of default PD,. The stress scenario occurs at time t = 1 
and propagates until t = 12. This is why we initially observe a jump in the probability 
of default, since it goes from 7.90% to 11.45%. The conditional expectation continues to 
increase and reaches a top at 14.03%. Then, it decreases and we obtain a new equilibrium 
after 3 years. 


In the previous example, we have also reported the conditional quantile gogy (S(X)). 
We observed that its values are larger than those given by the conditional expectation 
2 [PD; | S(X)]. These differences raise the question of defining a conditional stress scenario. 
Indeed, the previous framework defines the conditional stress scenario as the conditional 
expectation of the linear model Y, = 8o + pan Bi Xie + £r. In this case, the vector of 
parameters 3 = (o, 61,..-, Bn) is estimated by ordinary least squares. We could also define 
the conditional stress scenario S (Y) = qa (S (X)) as the solution of the quantile regression: 


Pr{Y; < da (S) | X =S} =a 


In this case, we can use the tools presented on pages 613 (parametric approach) and 643 
(non-parametric approach). The parametric approach assumes that the probability distri- 
bution between Y and X is Gaussian. The non-parametric approach is more adapted when 
this assumption is not satisfied, for example when the stochastic dependence is not linear. 


14.2.3.2 The conditional quantile solution 


In order to understand the impact of the dependence on the conditional stress scenario, 
we consider again the copula framework. If we consider the bivariate random vector (X,Y), 
using the linear regression is equivalent to assume that: 


Y; = E [Y; | Xi = x] + e 
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This implies that (X,Y) is a bivariate Gaussian random vector. The average dependence 
structure between X and Y is then linear and can be represented by the parametric function 
y = m (x) where m (x) is the conditional expectation function E [Y; | X+ = x]. However, the 
conditional expectation is not appropriate when (X,Y) is not Gaussian. 


The statistical framework We have defined the conditional quantile function qa (x) as 
the solution of the equation Pr {Y < qq (x) | X = z} = a. Let F (x,y) be the probability 
distribution of (X,Y). By using the integral transforms U; = F, (X) and U2 = F, (Y) 
where F, and F, are the marginal distributions, we have: 


Pr{Y < F}' (u2) | X =Fz'(uw)} =a 


where u1 = F, (x) and u2 = F, (y) = Fy (da (x£)). It follows that the quantile regression of 
Y on X is equivalent to solve the following statistical problem: 


Pr{U> < u2 | U =u} =Q 


or: 


where C (uj, U2) is the copula function associated to probability distribution F (x,y). We 
have uy = Co (u1, @&) where Cy); (u1, u2) = 01C (u1, u2). It follows that: 


F, (y) = C5} (Fe (2),a) 


Finally, we obtain y = qa (x) where: 
_ pl fco- 
ga (#) = Fy? (Cai (Fe (2) ,0)) 


Remark 177 In the case where X and Y are independent, we have C(ui,u2) = uiue, 

ôC (u1, U2) = Ua, Cji (u1,a) =Q and: 
Y = da (£) = Fy" (a) 

Therefore, the conditional quantile qa (x) of Y with respect to X = x is equal to the uncon- 

ditional quantile Ez- (a) of Y. 


Some special cases Let us assume that the dependence structure is a Normal copula 
with parameter p. On page 737, we have shown that: 


_ fg (BD a) gu 
Cum) = | o wipe Ja 


We deduce that: 


AC E (= (uz) — p® -t e) 


Solving the equation 0, C (u1, u2) = @ gives: 


us = ® (p0 (u1) + V1- 297! (a)) 
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The conditional quantile function is then: 


y= qa (2) = Fy! (@ (pb (F; (2)) + VT pO (a))) 


In the case of the Student’s t copula, we have demonstrated that: 


v+1 an 
u)? 


v+ [T7 ( vl- 


Coj (u1, u2; p, V) = Ti41 ( 


Solving the equation C); (u1, u2; p, V) = a gives: 


ug = Ty | pT7' (m)+ v1 ( 


where: 


1/2 
_ (et ee) gr oy 
I5 v+1 7 
Illustration Let us consider an example with two asset returns (21, R21). We assume 
that they follow a bivariate Gaussian distribution with uw, = 3%, u2 = 5%, o1 = 10%, o2 = 
20% and p = —20%. In Figure 14.11, we have reported the conditional quantile function 
Rot = da (Ri) for different confidence levels a. We verify that the median regression 
corresponds to the linear regression. The quantile regression shifts the intercept below when 
a < 50% and above when a > 50%. We now assume two variants of this example: 


1. the dependence structure is the previous Normal copula, but the marginal distribu- 
tions follow a Student’s tı distribution’; 


2. the marginal distributions are the previous Gaussian distributions, but the dependence 
structure is a Student’s tı copula. 


Results are given in Figures 14.12 and 14.13. We deduce that the linearity of the con- 
ditional quantile vanishes if the marginals are not Gaussian or the dependence structure is 
not Gaussian. In the first case, assuming a linear dependence between R; and R2, implies 
to overestimate on average the conditional return R2, | Ri. when the first asset has high 
negative returns. In the second case, we obtain the contrary result. 


14See page 738. 
15We have 
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FIGURE 14.11: Conditional quantile (Gaussian distribution) 
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FIGURE 14.12: Conditional quantile (Normal copula and Student’s t marginals) 
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FIGURE 14.13: Conditional quantile (Student’s t copula and Gaussian marginals) 


14.2.4 Reverse stress testing 


According to EBA (2018b), reverse stress test “means an institution stress test that 
starts from the identification of the pre-defined outcome (e.g. points at which an institution 
business model becomes unviable, or at which the institution can be considered as failing 
or likely to fail) and then explores scenarios and circumstances that might cause this to 
occur”. The underlying idea is then to identify the set of risk factors that may cause the 
bankruptcy of the bank (or the financial institution). The difference between stress testing 
and reverse stress testing can be summarized as follows: 


e In stress testing, extreme scenarios of risk factors are used to test the viability of the 


bank: 
D=0 ifS(L(w))<C 


BEDS Fn) + SEU) > D=1 otherwise 


Using the set of stressed risk factors, we then compute the corresponding loss S (L (w)) 
of the portfolio. This stress can cause the default of the bank if the stressed loss is 
larger than its capital C. 


e In reverse stress testing, extreme scenarios of risk factors are deduced from the 
bankruptcy scenario: 


D=1=>RS(L(w)) > (RS(F,),..., RS (Fm)) 


We first assume that the bank defaults and compute the associated stressed loss. 
Then, we deduce the implied set of risk factors that has produced the bankruptcy. 


Therefore, reverse stress testing can be viewed as an inverse problem, which can face very 
quickly a curse of dimensionality. 
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14.2.4.1 Mathematical computation of reverse stress testing 
We assume that the portfolio loss is a function of the risk factors: 
L(w) = €(Fi,..-, Fm; w) 
Let (S(Fi),...,S(Fm)) be the stress scenario. The associated loss is given by: 
S(L(w)) = €(S(Fi),.-.,S (Fm); w) 


Reverse stress testing assumes that the financial institution has calculated the reverse 
stressed loss RS (L (w)) that may produce its bankruptcy. It follows that the reverse stress 
scenario RS is the set of risk factors that corresponds to this stressed loss: 


RS = {(RS(Fi),...,RS(Fim)) : (S (FA) ,...,S (Fm); w) = RS (L (w))} 


Since we have one equation with m unknows, there is not a unique solution except in some 
degenerate cases. The issue is then to choose the most plausible reverse stress scenario. For 
instance, we can consider the following optimization program!®: 


(RS (Fi),...,RS(Fin)) = argmaxin f (Fi,...,Fm) (14.2) 
s.t. €(S(Fi),...,S(Fim);w) = RS (L(w)) 


where f (%1,...,%m) is the probability density function of the risk factors (F1,...,Fm). 
The linear Gaussian case We assume that F ~ N (ux, Uz) and L(w) = St wjFj = 
w! F. Problem (14.2) becomes: 
Reh = 
RS(F) = argmin > (F — uF)” D (F — uF) 
s.t. w'F=RS(L(w)) 


The Lagrange function is: 
1 
£(F;A) = 5 (F —ur)' EF (F — ur) — à (w" F -RS (L (w))) 


We deduce the first-order condition: 
ƏL(F; A) 
OF 


It follows that F = ur + AU pw. Since we have w! F = w! ur + Aw! Uw, we obtain: 


_ RS(L(w)) -wT ur 


= D5) (F — pr) -—dAw =0 


A wl Erw 
and: S 
AFW —w" 
RS (F) = uF + T (RS (L (w)) —w! uF) (14.3) 


Another approach for solving the inverse problem is to consider the joint distribution of 


F and L (w): 
( na ) ~w (( rae )( wT By me )) 


16We notice that maximizing the density is equivalent to maximizing its logarithm. 
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Using Appendix A.2.2.4 on page 1062, we deduce that the conditional distribution of F 
given L (w) = RS (L (w)) is Gaussian: 


F | L (w) = RS (L (w)) ~ N (uz |z(w)s EFIL(w)) 


where: Sew 
HF\|L(w) = HF + U Sru (RS (L (w)) —w" uF) 
and: a 
UF|L(w) = XF- a 


We know that the maximum of the probability density function of the multivariate normal 
distribution is reached when the random vector is exactly equal to the mean. We deduce 
that: 


RS(X) = FLW) 


TET, (RS (L (w))—w' uF) (14.4) 


II 
= 
4 
+ 


Example 162 We assume that F = (Fi, F2), uF = (5,8), oF = (1.5, 3.0) and p (Fi, F2) = 
—50%. The sensitivity vector w to the risk factors is equal to (10,3). 


The stress scenario is the collection of univariate stress scenarios at the 99% confidence 
level: 
S(Fi) = 5+1.5-671 (99%) = 8.49 
S(F2) = 8+3.0- 671 (99%) = 14.98 


The stressed loss is then equal to: 
S(L(w)) = 10-8.49 + 3-14.98 = 129.53 


We assume that the reverse stressed loss is equal to 129.53. Using Formula (14.4), we 
deduce that RS(F,) = 10.14 and RS(F2) = 9.47. The reverse stress scenario is very 
different than the stress scenario even if they give the same loss. In fact, we have 
f (S (Fi), S(F2)) = 0.8135 - 107° and f (RS (F1), RS (F2)) = 4.4935 - 107°, meaning that 
the occurrence probability of the reverse stress scenario is more than five times higher than 
the occurrence probability of the stress scenario. 


The general case In the general case, we use a copula function C in order to describe 
the joint distribution of the risk factors. We have: 


nf Finite) = n c (F1 (Fi), -, Em (Fm)) +> l f (F3) 
j=1 


where c(u1,...,Um) is the copula density, F; is the cdf of F; and f; is the pdf of F;. Finally, 
we obtain a non-linear optimization problem subject to a non-linear constraint. 
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14.2.4.2 Practical solutions 


There are very few articles on reverse stress testing, and a lack of statistical methods. 
However, we can cite Grundke (2011), Kopeliovich et al. (2015), Glasserman et al. (2015) 
and, Grundke and Pliszka (2018). In these research papers, the optimization problem is 
generally approximated. For instance, Kopeliovich et al. (2015) and Grundke and Pliszka 
(2018) consider the PCA method to reduce the problem dimension. Glasserman et al. (2015) 
propose to use the method of empirical likelihood in order to evaluate the probability of a 
reverse stress test. 

From a practical point of view, banks generally use a fewer number of risk factors. This 
helps to reduce the problem dimension. They can also consider a Gaussian approximation. 
In fact, the main difficulty lies in the equality constraint. This is why they generally consider 
the following optimization problem: 


(RS (Fi),...,RS(Fim)) = argmaxinf (Fi,...,Fm) 
s.t. €(S(Fi),...,S(Fm);w) > S(L(w)) 


In this case, they can use the Monte Carlo simulation method to estimate the reverse stress 
scenario. 


14.3 Exercises 
14.3.1 Construction of a stress scenario with the GEV distribution 


1. We note a, and bn, the normalization constants and G the limit distribution of the 
Fisher-Tippet theorem. 


(a) Find the limit distribution G when X ~ £ (A), an = X71 and bn = A~! Inn. 
(b) Same question when X ~ U1), an = n~} and bn =1—n7!. 


(c) Same question when X is a Pareto distribution P (a, 6): 


(a) =1- (5) 


and the normalization constants are an = 0a~!n!/ and bp = @n!/* — 6. 


2. We denote by G the GEV probability distribution: 


G (2) = exp (- (e eae 


What is the interest of this probability distribution? Write the log-likelihood function 
associated to the sample {21,..., £r}. 


3. Show that for € —> 0, the distribution G tends toward the Gumbel distribution: 


eee) 
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4. We consider the minimum value of daily returns of a portfolio for a period of n trading 
days. We then estimate the GEV parameters associated to the sample of the opposite 
of the minimum values. We assume that € is equal to 1. 


(a) Show that we can approximate the portfolio loss (in %) associated to the return 
period 7 with the following expression: 


un(e (Z=) 


where fi and ô are the ML estimates of the GEV parameters. 


(b) We set n equal to 21 trading days. We obtain the following results for two port- 
folios: 


Portfolio À ô € 
#1 1% 3% 1 
#2 10% 2% 1 


Calculate the stress scenario for each portfolio when the return period is equal 
to one year. Comment on these results. 


14.3.2 Conditional expectation and linearity 


We consider the bivariate Gaussian random vector (X,Y): 


CF) (CE) o R )) 
Y by T N PuyFxPy Oy 
1. Using the conditional distribution theorem, show that: 

Y = 609+ 8X +oaU 


where U ~ N (0,1). Give the expressions of bo, 6 and ø. 


2. Deduce the conditional expectation function m (x): 


m(x) =E[|Y | X = q] 


3. Let (X, Y) be the log-normal random vector such that X = exp (X) and Y = exp (Y). 
Find the conditional expectation function m (x): 


4. Comment on these results. 


14.3.3 Conditional quantile and linearity 


Let X and Y bean x 1 random vector and a random variable. We assume that (X,Y) 


is Gaussian: 
X u D X 
eG el are as )) 
( Y ) Hy Zy Ly,y 


We note F, (x) and F, (x) the marginal distributions, and F (x,y) the joint distribution. 
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. Calculate the conditional distribution F (y | X = x) of the random variable Y (x) = 
Y | X = z. Deduce the conditional quantile defined by: 


qa (x£) = inf {q : Pr (Y (x) < q) = a} 


. Show that: 
qa (£) = bo (a) + Bla 


where 6o (a) is a function that depends on the confidence level a. 


. Compare qa (x) with the conditional expectation m (x). Deduce the main difference 
between linear regression and quantile regression. 


. We consider an exponential default time T ~ E (A) that depends on the risk factors X. 
Moreover, we assume that X is Gaussian N (ur, Ux,2) and the dependence between 
the default time 7 and the risk factors X is a Normal copula. Find the conditional 
quantile function q4 (x) of the random variable 7 (x) = 7 | X = z. 


. We now consider the probability of default PD associated to the default time 7 ~ £ (A). 
Calculate the conditional quantile function qFP (x) of the random variable PD (x) = 
PD|X =z. 


. We consider the single factor case where X ~ M (tase) and we assume that the 
parameter of the Normal copula between 7 and X is equal to p. Show that: 


Gq (x) = © (a= (a) VI- + pE) 


. Comment on these results and propose a quantile regression model to stress the prob- 
ability of default. 


Taylor & Francis 
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http://taylorandfrancis.com 


Chapter 15 


Credit Scoring Models 


Credit scoring refers to statistical models to measure the creditworthiness of a person or a 
company. They have progressively replaced judgemental systems and are now widely used by 
financial and banking institutions that check the credit rating and capacity of the borrower 
before to approve a loan. Therefore, credit scoring is at the heart of the decision-making 
system for granting credit. This is particularly true for consumer credit (mortgage, credit 
card, personal loan, etc.). Credit scoring models are also used for commercial firms, but 
their final outputs are generally not sufficient for making a decision. For instance, they can 
be completed with the knowledge of the relationship manager on the company. 


Credit scoring first emerged in the United States. For instance, one of the oldest credit 
scores is the FICO score that was introduced in 1989 by Fair Isaac Corporation. The FICO 
score is based on consumer credit files of consumer credit reporting agencies such as Ex- 
perian, Equifax and TransUnion. It remains today the best-known and most-used external 
scoring system in the world. In thirty years, credit scoring models have evolved considerably, 
and financial institutions have generally built their own internal credit scoring system. In 
particular, the development of credit scoring techniques has speeded up in the 2000s with 
the introduction of the IRB formula in the Basel II Accord. For instance, they are now used 
for estimating the probability of default or the loss given default, while validation and back- 
testing procedures are better defined. The estimation of credit scores has also benefitted 
from the massive development of marketing scores, big data and machine learning. 


15.1 The method of scoring 
15.1.1 The emergence of credit scoring 
15.1.1.1 Judgmental credit systems versus credit scoring systems 


The underlying idea of credit valuation is to use the experience in order to approve or 
deny the credit of a (new) customer. In the case of judgmental credit analysis, the decision 
is made by a credit analyst or the relationship manager, and is based on the character, 
the capacity and the capital of the borrower. Past experience of the credit analyst is then 
fundamental, and two credit analysts may give two different answers. Moreover, it takes 
many years to build a track record, because it is not an industrial process. Indeed, the 
credit analyst can analyze only a limited number of requests per week. Because of the high 
costs, financial institutions have sought to automate credit decisions. 


In 1941, Durand presented a statistical analysis of credit valuation. He showed that credit 
analysts uses similar factors, and proposed a credit rating formula based on nine factors: 
(1) age, (2) sex, (3) stability of residence, (4) occupation, (5) industry, (6) stability of 
employment, (7) bank account, (8) real estate and (9) life insurance. The score is additive 
and can take values between 0 and 3.46. For instance, 0.40 is added to the score if the 


923 


924 Handbook of Financial Risk Management 


applicant is a woman, 0.30 if the applicant is 50 years old or more, etc. Durand’s formula is 
the first credit scoring model that has been published. Such credit scoring models become 
more and more popular in financial institutions in the 1950s and 1960s, but the real turning 
point is the development of the credit card business in the 1970s (Thomas, 2000). From an 
industrial point of view, a credit scoring system has two main advantages compared to a 
judgmental credit system: 


1. it is cost efficient, and can treat a huge number of applicants; 
2. decision-making process is rapid and consistent across customers. 


Generally, financial institutions also consider that credit scoring systems are more efficient 
and successful than judgmental credit systems. However, comparing track records is always 
a difficult exercise since it depends on many factors. Some credit analysts may have a very 
good track record, while the live performance of some statistical credit models may be worse 
than their backtest performance. Nevertheless, the case of credit cards has demonstrated 
that credit scoring models are far better than judgmental credit systems. The main reason 
is the large amount of data that can be analyzed by a statistical model. While experience 
is essential for a credit analyst, the efficiency of credit scoring depends on the quality and 
amount of data. 


15.1.1.2 Scoring models for corporate bankruptcy 


These models appear with the research of Tamari (1966), who proposed to combine 
several financial ratios for assessing the financial health of corporate firms. Nevertheless, 
the weight of each ratio was assumed to be fixed and has been arbitrary calibrated. The 
empirical work of Beaver (1966) was more interesting since he estimated the univariate 
statistical relationship between financial ratios and the failure. However, the seminal paper 
for the evaluation of creditworthiness is the publication of Altman (1968). Using a small 
dataset and the statistical method of discriminant analysis, he introduced the concept of 
z-score for predicting bankruptcy of commercial firms. The score was equal to: 


Z=1.2-X,+14-X24+3.3-X%34+0.6-X44+1.0-X5 
where the variables X; represent the following financial ratios: 


X; | Ratio 

Xı | Working capital / Total assets 

Xə | Retained earnings / Total assets 

X3 | Earnings before interest and tax / Total assets 
X4 | Market value of equity / Total liabilities 

Xs | Sales / Total assets 


If we note Z; the score of the firm i, we can calculate the normalized score Z% = 
(Z; —mz)/o, where m, and g, are the mean and standard deviation of the observed 
scores. Z* can then be compared to the quantiles of the Gaussian distribution or the em- 
pirical distribution. A low value of Z} (for instance 77 < 2.5) indicates that the firm has 
a high probability of default. Today, the technique of z-score, which consists of normalizing 
a score, is very popular and may be found in many fields of economics, finance, marketing 
and statistics. 


15.1.1.3 New developments 


Since the publication of Durand (1941) and Altman (1968), the research on credit scoring 
can be split into three main categories: 
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e The first category concerns the default of corporate firms. It appears that the choice 


of financial ratios and relevant metrics as explanatory variables are more important 
than the model itself (Hand, 2006). Other factors such as the business cycle, economic 
conditions or market prices (Hillegeist et al., 2004) may be taken into account. More- 
over, the one-size-fits-all approach is not appropriate and credit scoring models are 
different for stock-listed companies, medium-sized companies, financial companies or 
industrial companies (Altman et al., 2010). 


The second category focuses on consumer credit and retail debt management (credit 
cards, mortgages, etc.). Sample sizes are larger than for corporate credit (Thomas, 
2000) and may justify the use of more sophisticated techniques that include the be- 
havior of the customer (Thomas et al., 2017). 


The third research direction concerns statistical methods. Besides discriminant analy- 
sis, new approaches have been proposed, in particular logit or probit models (Ohlson, 
1980; Lennox, 1999) and survival models (Shumway, 2001). Moreover, with the avail- 
ability of more personal data, machine learning techniques such as neural networks 
(West, 2000) are also used and tested in credit scoring and are not reserved for only 
marketing scores. 


15.1.2 Variable selection 
15.1.2.1 Choice of the risk factors 


Variables used to determine the creditworthiness of a borrower are generally based on 5 
risk factor categories, also called the five Cs: 


1. 


Capacity measures the applicant’s ability to meet the loan payments. For example, 
lenders may look at the debt-to-income or the job stability of the applicant. In the 
case of corporate firms, the cash flow dynamics is a key element. 


Capital is the size of assets that are held by the borrower. In the case of consumer 
credit, it corresponds to the net wealth of the borrower. For a corporate firm, it can 
be machinery, equipment, buildings, investment portfolio, etc. 


Character measures the willingness to repay the loan. For example, the lender can 
investigate the payment history of the applicant. If the applicant has children, the 
applicant may have more incentive than if he/she is single. 


Collateral concerns additional forms of security that the borrower can provide to the 
lender. This item is particularly important in the case of corporate credit. 


Conditions refer to the characteristics of the loan and the economic conditions that 
might affect the borrower. For example, the score is generally a decreasing function of 
the maturity and the interests paid by the borrower. For corporate firms, some sectors 
are more dependent on the economic cycle than others. 


In Table 15.1, we report some variables that are used when building a consumer credit score. 
This type of score is generally used by banks, since they may include information that is 
related to the banking relationship. 


Scores are developed by banks and financial institutions, but they can also be developed 
by consultancy companies. This is the case of the FICO® scores, which are the most widely 
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TABLE 15.1: An example of risk factors for consumer credit 


Character Age of applicant 
Marital status 
Number of children 
Educational background 
Time with bank 
Time at present address 
Capacity Annual income 
Current living expenses 
Current debts 
Time with employer 
Capital Purpose of the loan 
Home status 
Saving account 
Condition Maturity of the loan 
Paid interests 


used credit scoring systems in the world!. They are based on 5 main categories: payment 
history (35%), amount of debt (30%), length of credit history (15%), new credit (10%) 
and credit mix (10%).They generally range from 300 to 850, while the average score of US 
consumers is 695. These scores are generally classified as follows: exceptional (800+), very 
good (740-799), good (670-739), fair (580-669) and poor (580—). 


Corporate credit scoring systems use financial ratios: 


1. Profitability: gross profit margin, operating profit margin, return-on-equity (ROE), 
etc. 


2. Solvency: debt-to-assets ratio, debt-to-equity ratio, interest coverage ratio, etc. 


3. Leverage: liabilities-to-assets ratio (financial leverage ratio), long-term debt /assets, 
etc. 


4. Liquidity: current assets/current liabilities (current ratio), quick assets/current lia- 
bilities (quick or cash ratio), total net working capital, assets with maturities of less 
than one year, etc. 


Liquidity and solvency ratios measure the company’s ability to satisfy its short-term and 
long-term obligations, while profitability ratios measure its ability to generate profits from 
its resources. High profitability, high solvency and high liquidity reduces the probability of 
default, but a high leverage increases the credit risk of the company. The score may also 
include non-financial variables: firm age’, size (number of employees), quality of accounting 
information, management quality, etc. For instance, we generally consider that large firms 
default less often than small firms. Like retail scores, corporate scores are built by banks but 
also by consulting firms and credit agencies. For example, Moody’s proposes the RiskCalc 
model (Falkenstein et al., 2000). 


lThe FICO scores are developed since 1989 by Fair Isaac Corporation, which is a Californian-based 
firm. There are more than 20 scores that are commonly used for auto lending, credit card decisioning, 
mortgage lending, etc. In the US, FICO scores are used in over 90% of lending decisions (source: https: 
//waw.myfico.com). 

?Recent firms may be penalized. 
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15.1.2.2 Data preparation 


Of course data quality is essential for building a robust credit scoring. However, data 
preparation is not limited to check the data and remove outliers or fill missing values. 
Indeed, a ‘one-size-fits-all’ approach is generally not appropriate, because a scoring model 
is generally more a decision tree system than a parsimonious econometric model. This 
is why credit scoring is work-intensive on data mining. Once the data is clean, we can 
begin the phase of exploratory data analysis, which encompasses three concurrent steps: 
variable transformation, slicing-and-dicing segmentation and potential interaction research. 
The first step consists in applying a non-linear transformation, for example by computing 
the logarithm, while the second and third steps are the creation of categorical/piecewise 
and interaction variables. 


Piecewise and dummy variables Let b be a px 1 vector of bounds. We assume that b 
is sorted in ascending order. We note b) = (—oo, b), b) = (b, +00), b) = (b1, b) and: 


b = (0, b2 — b1, b3 — ba, ... , bp — bp-1,0) 


It follows that b™, b), b@) and bC) are four vector of dimension (p+ 1) x 1. From the 
vector b, we can then create (p + 1) piecewise variables which are defined by: 


PW; = (x-0) A{X > 0 \. 1{X <oP S40)? 21x > 0} 


The underlying idea is to have an affine function if the original variable takes its values 
in the interval ]bj—1,b;]. For instance, Figure 15.1 represents the fourth piecewise variables 
which are obtained from b = (—0.5,0,1). In a similar way, we define dummy variables as 


follows: oi {x i iC D. {x< z 7 a} 


In this case, D; takes a value of 1 if X € ]bj—1,b;]. Using b = (—0.5,0,1), we obtain Figure 
15:2. 


Optimal slicing An important point is the choice of the bound b = (b1, b2, ..., bg). It is 
obvious that the optimal values depend on the response variable Y. For that, we introduce 
the contingency table of the random vector (Y, X), which corresponds to a table of counts 
with p rows and q columns: 


Y/X |xeT) ... Xer Xe? 
Ye I) N11 N15 Niq 
Y E T% ni1 Nij Niq 
Ye i? Np,1 Np, j "p,q 


where ni j is the number of observations such that Y € ad ) and _ € aa We assume 


that the set are disjoints: a N n = É for jı # j2 and 1 nz% = -o for i; # ig. We 
introduce the following notations 
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FIGURE 15.1: Piecewise variables 
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FIGURE 15.2: Dummy variables 
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e ni. = Dm N; j is the number of observations such that Y € qT), 


e n.; =) -?_, ni; is the number of observations such that X € gi). 
j = ia nij j ? 


— 55 q 3 : nd 
en= V1 a1 ni j is the total number of observations”. 


If we assume that X and Y are independent (null hypothesis Ho), the expected number of 
observations such that Y € zO and X € i must be equal to: 


= ni, X Nag 
Nij = rs 


Under Ho, we can prove that the Pearson’s statistic x has a chi-squared limit distribution: 
avs (tig = Tad) 
Sa) a n 
i=1 j=1 Mij 


where v = (p — 1) (q — 1). If we apply the Pearson’s chi-squared statistic to the previous 
scoring problem, the contingency table becomes: 


X |X<b b<X<b  bı<X<bp X>b 
Y=0] noa no,2 prs NO, p NO, p+1 
Y=1| ni nı,2 a N1,p M1,p+1 


We assume here that Y is a binary random variable: Y = 0 indicates a good credit and 
Y = 1 corresponds to a bad credit. We note x (b) the value of the chi-squared statistic that 
depends on the slicing vector b: 


2 


_ (nij — Mig) 
ce) a 
i=0 j=1 J 
The optimal value of b is defined by: 
b* = arg max x (b) (15.1) 


Indeed, if X and Y are independent, we have x (b) = 0. In this case, the variable X does not 
help to predict the variable Y. Maximizing the chi-squared statistic is equivalent to finding 
the slicing setup that deviates the most from the independent case. 

In order to solve the maximization problem (15.1), we may use the dynamic programming 
principle, whose objective function is to solve this problem: 


K-1 
{1 (k)} = argmax > f (k,8(k),c(k)) + f (K,s(K)) (15.2) 
k=1 
s (k+ 1) = g (k, s (k) ,c(k)) 
si s(k) € S(k) 
c(k) € C (k) 
s(l)=s 
The underlying idea is to initialize the algorithm‘ with a predetermined slice {b1, b2,..., bp}, 
to aggregate the knots in order to find the optimal slice { bt, Oota bs, } for a given value 


3We also have: 
p q 
n = j Ti = j N. j 
i=1 j=1 


“The algorithm is described on page 1049. 
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of p*. For that, we note n; j (bj, bja) = # (Y = i, bj < X < bja). The chi-squared marginal 
contribution is defined by: 


1 = 2 
Nij bj, , bja — Nij bj, bj 
Lose j (bja, bja) j (bj, 5 bja)) 


= for Ji < J2 
i=0 Nij (bj , bja) 


x (bj, bja) can be viewed as the Pearson’s statistic when we only consider the observations 
such that bj < X < bja. The gain function is equal to: 
_ f =œ if c(k) < s (k) 
EE ae { X (bs(k)+1> beck) ) otherwise 
If k = 1, we have: 
—oo if c(1) < s(1) 
1,s(1),c(1)) = i 
f ( 5 ( ) e( )) { X (bo, bs(1)) + X (bsa)+1; be(1)) otherwise 


The transfer function is defined as follows: 
s(k +1) = g (k,s(k),c(k)) = c (k) 


The state variable s (k) and the control variable c (k) take their values in the set {1,2,..., p}. 
The number K of iterations is exactly equal to p* and we have: 


f (K, 83) = x (bs;+1, bp) 


In the case where p* = 1, the dynamic programming algorithm reduces to the brute force 
algorithm: 
7 = arg Max xX (—00, bj) + x (bj, 00) 

jE {1 ,b2,..-,bp} 


In this case, the optimal slice is composed of two classes: X < bj» and X > bj». 


Example 163 We consider 40 observations of the random vector (Y, X). Below, we indi- 
cate the values taken by X when Y =0 and Y =1: 


e Y = 0: —2.0, —1.1, —1.0, —0.7, —0.5, —0.5, —0.4, —0.3, —0.2, —0.2, 0.0, 0.7, 0.8, 
0.9, 1.0, 1.4, 1.9, 2.8, 3.2, 3.7. 


e Y = 1: —5.2, —4.3, —3.6, —2.7, —1.8, —1.5, —1.2, —1.0, —0.8, —0.1, 0.0, 0.2, 0.2, 
0.3, 0.5, 0.5, 0.5, 0.7, 0.8, 1.9. 


If we consider the following grid b = (—5, —4, —3, —2, —1, 0, 1,2,3), we obtain the fol- 
lowing contingency table: 


xX X X xX X X X X X xX 
x [2 O 2 2 29 O 2) O AO 7 


=0 0 0 0 0 2 8 4 3 1 2 
1 1 1 1 1 3 3 9 1 0 0 


where Z? = {X < —5}, IVO = {-5 < X < —4}, ..., IQ? = {X > —4}. If we would 
like to slice X into two classes, we use the brute force algorithm. If we group the intervals 
= dik IN) \, the contingency table becomes: 


X |X<-5 X>-5]| ni 

Y=0 0 20 20 

Y=1 1 19 20 
ny 1 39 | n=40 
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We deduce that: 
(0 — 0.5)? | (20—19.5)7 (1-0.5)? , (19 — 19.5)? 


0.5 19.5 0.5 19.5 
= 1.02564 


If we now consider the two groups moa and an, sake aN we obtain the fol- 
lowing contingency table: 


X |X<-4 XS] ni 
Y=0 0 20 20 
Y=1 2 18 20 

nj 2 38 | n=40 


The associated Pearson’s chi-squared statistic is then equal to: 


(0—1.0)? (20 — 19.0)? ĉĉ- 1.0)? Mo 19.0)” 
1.0 19.0 1.0 19.0 
= 2.10526 


We can proceed in the same way with the other values of b and we obtain the following 
values of x when the cut-off is bj: 


X| by b2 b3 b4 bs be b7 bg bg 


If we prefer to slice X into three classes, the dynamic programming algorithm finds that 
the optimal cut-offs are b* = (—2,1). In the case of four classes, the optimal slicing is: 


X X<-1 -1<X<0 0<X<1 X>l1 


and the optimal value y* is equal to 10.545. In order to understand how does the dynamic 
programming algorithm work, we report the J and C matrices in Table 15.2. We notice that 
the optimal value is J (1, s* (1)) = 10.545 where s* (1) is the 5‘ state. Moreover, the optimal 
controls are c* (1) = 6 and c* (2) = 7 implying that s* (2) = c* (1) and s* (3) = c* (2) are 
the 6'> and 7" states. This is why the optimal cut-offs are b* = (—1,0,1), that is the 5", 
6 and 7' elements of the initial vector b. 


Remark 178 We notice that the optimal slice b* depends on the initial grid b. This implies 
that another grid b will not necessarily give the same optimal slice. For instance, we have 
used a step of 1 in the previous example. If we use a step of 0.2, we obtain the optimal 
solution b* = (—0.8, —0.2,0.6). We have reported the corresponding slicing in Figure 15.3. In 
this case, the Pearson’s chi-squared statistic is equal to 18.444, which is better than the value 
10.545 obtained previously. This is why it is better to use a small step than a large step. The 
risk is that the dynamic programming algorithm produces some classes with a low number 
of observations. To prevent this possible overfitting, we can impose that x (b;,,6;,) = —o0 
when the number of observations is below a given threshold (# (bj, < X < bja) < Mmin). 
This ensures that each optimized class has at least nmin observations. 
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TABLE 15.2: Dynamic programming matrices J and C 
state J (1,s(1)) F(1,s(2)) 7 (U,8(3)) c0) e(2) 


1 7.6059 4.0714 0.0256 4 7 
2 7.7167 3.8618 0.1053 6 7 
3 9.0239 3.7048 0.2432 6 7 
4 10.4945 3.6059 0.4444 6 7 
5 10.5450 3.5714 0.8065 6 7 
6 5.9231 5.4945 0.0000 7 7 
7 4.7576 4.0000 3.5714 8 8 
8 —oo 3.0000 3.0000 1 9 
9 —oo —oo 2.0000 1 1 


= 0 
S = í 
E 
v 
a 
o 
o 
i 

YU 
[O 99 iQ) 99 
FIGURE 15.3: Optimal slicing with four classes 
15.1.2.3 Variable selection 
In practice, one may have many candidate variables X = (Xj,...,Xm) for explaining 


the variable Y. The variable selection problem consists in finding the best set of optimal 
variables. Let us assume the following statistical model: 


Y=f(X)+u 
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where u ~ NV (0, a”), We denote the prediction by Y = f (X). By assuming the standard 
statistical hypotheses, we obtain: 


[OY] = [eor] 


= (fw -ræ +e- o] 
= Bias? + Variance + Error 


Hastie et al. (2009) decompose the mean squared error of f (X) into three terms: a bias 
component, a variance component and an irreducible error. This bias-variance decomposi- 
tion depends on the complexity of the model. When the model complexity is low (i.e. when 
there is a low number of regressors), the estimator f(X) generally presents a high bias 
but a low variance. When the model complexity is high (i.e. when there is a high number 
of regressors), the estimator f(X) generally presents a low bias but a high variance. The 
underlying idea of variable selection is then to optimize the bias-variance trade-off. 


Best subset selection A first approach is to find the best subset of size k for k € 
{1,...,m} that gives the smallest residual sum of squares. It follows that the search is 
performed through 2” possible subsets, meaning that we rapidly face a combinatorial ex- 
plosion. Moreover, minimizing the residual sum of squares is equivalent to consider the 
largest subset (1,...,m). This is why we prefer to consider an information criterion that 
penalizes the degree of freedom of the model. For instance, the Akaike criterion is defined 
as follows: 
AIC (a) = -2€ (4) (8) + a- ata 

where Lik) (ô) and gi ea are the log-likelihood and the degree of freedom of the kt} 


modelë. Therefore, the best model corresponds to the model that minimizes the Akaike 
criterion. In practice, the penalization parameter is generally set to œ = 2. In the case of 
the previous model, we deduce that: 

Rss (ô) 


+2 qf model) 


AIC (2) = nln (k) 


Stepwise approach Another way for selecting variables is to use sequential approaches: 
forward selection, backward selection and forward/backward combined selection. In the case 
of forward selection, we start with the intercept and include one variable by one variable. 
At each step, we select the model of dimension k + 1 with the most significant F-value with 
respect to the previous optimal model of dimension k: 


RSS (8) — RSS (6-41) 


AH residual 
RSS (Ôe) ) ay 


F= 


5 df ie is a complexity measure of the model, and corresponds to the number of estimated parameters. 
It is sometimes called the ‘model degree of freedom’ whereas the classical measure used in linear regres- 
sion t-statistics a is called the ‘residual degree of freedom’. We have the following relationship 


agsresidual) =n— a where n is the number of observations. 


(k) 
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We stop when no model produces a significant F-value at the 95% confidence level. In the 
case of backward selection, we start with all the variables and remove one variable by one 
variable. At each step, we select the model of dimension k with the smallest significant F- 
value with respect to the previous optimal model of dimension k+1. The forward/backward 
combined procedure consists in using a forward step followed by a backward step, and to 
iterate this loop until the convergence criterion is reached. The convergence criterion can 
be expressed as a maximum number of loops®. 


Lasso approach The lasso method consists in adding a Lı penalty function to the opti- 
mization function in order to obtain a sparse parameter vector 0: 


K 


Lı (8) = |9\], = X 1x 


k=1 


For example, the lasso regression model is specified as follows (Tibshirani, 1996): 


K K 
w= >> Betin tui st. XO [Bl <r 
k=1 k=1 


where 7 is a scalar to control the sparsity. Using the notations introduced on page 604, we 
have: 


B(r) = argmin(Y — X$)' (Y -— X8) (15.3) 
st. |[Blly <7 
This problem is equivalent to the Lagrange optimization program ĝ (A) = arg min £ (8; A) 


where’: 


L£(BA) = =(Y—XB)' (Y -X8) +All 


«x =6' (X'XK)6-6' (K'Y)+AIB]l, 


VI = N| =| 


The solution 6 (A) can be found by solving the augmented QP program where 6 = 8+ — 67 
under the constraints 8* > 0 and 87 > 0. We deduce that: 


K 
Set - Be | 
k=1 


K 
= Deel + > 0 16 
k=1 


k=1 
= 1'8++1'87 


Ills 


Since we have: 
I I Bt 
B=(In -Ig ) ( B- ) 
the augmented QP program is specified as follows: 


1 
6 = argmin z2 Q0 —O'R 
s.t. 0>0 


6The algorithm also stops when the variable to be added is the same as the last deleted variable. 


Tr and À are related by the relationship T = Ne ()||,- 
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where 0 = (8+,8-),X =(X _ -X ), Q= X'X and R= X'Y — A- 1. If we denote 
A= ( Ig —Ikz ), we obtain: 
B (A) = Ab 


Remark 179 If we consider Problem (15.3), we can also solve it using another augmented 
QP program: 


A 


1 
Ê = argmin 58 Q0 —6'R 


i Cé>D 
s.t. 0>0 


where Q = XTX, R= Š Y, C = —1" and D = —r. We again have Ê (T) = AO. 
We have: 
RSS (8) = (Y -X6)' (Y - X8) 
(Y _x (a +8- D (x _x (80s Ep ô”) ) 
= (K=) (v=x) a(x =x) (6) + 
( 


B- gen)" xX (3 E pe) 


We notice that: 
œ) = (x-xô™) x(s-ô*) 
yT (ae) x") x (6B) 


( 
= (x - ((X"X) 'xty) x") x (8-3) 
( 


YTX- ((KTX) a) x"x) (B - 8) 


Finally, we obtain: 
x R T x 
RSS (8) = RSS (8°) xa (8 2 p XX (8 = pe) 
If we consider the equation RSS (8) = c, we distinguish three cases: 


1. if c < RSS (F; there is no solution; 
2. if c = RSS (2%); there is one solution 6* = pols, 


3. ifc > RSS (2°), we have: 


(6- pom)’ 4 (6- ger) = 
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where: 
X'X 


A= ——— r 

c— RSS (ĝos) 

The solution (* is an ellipsoid, whose center is pols and principal axes are the eigen- 
vectors of the matrix A. 


If we add the lasso constraint D |8x| < T, the lasso estimator 8 (T) corresponds to the 
tangency between the diamond shaped region and the ellipsoid that corresponds to the 
possible maximum value of c. The diamond shape region due to the lasso constraint ensures 
that the lasso estimator is sparse: 


dn>0:Vr<yn, min (41 (r),..-,Bx (7) =0 
For example, the two-dimensional case is represented in Figure 15.4. We notice that ĝi (7) 


is equal to zero if rT < 7. This sparsity property is central for understanding the variable 
selection procedure. 


B24 RSS (61, 82) = constant 


léi + |8| < T d 


Bil + [Bol n 
papot lasso path 


FIGURE 15.4: Interpretation of the lasso regression 


Example 164 Using the data given in Table 15.3, we consider the linear regression model: 


5 
Yi = bo + Y bhli + Us (15.4) 
k=l 


The objective is to determine the importance of each variable. 


The lasso method can be used for ranking the variables. For that, we consider the 
following linear regression: 


5 
Ji = X Brik + ui 
k=1 
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TABLE 15.3: Data of the lasso regression problem 


Y i Tı T2 % T4 T5 
3.1'2.8 43 0.3 22 3.5 
24.9, 5.9 3.6 3.2 0.7 6.4 


= ~ 
DOONAN 

A 

N 

a 

j=) 

N 

D 

Ne) 

a 

=) 

a 

=) 

D 


m 
= 
N 
Ne) 
N 
Kæ 
a 
nx 
= 
Ə 
on 
Ne) 
on 
i 


12 | 37.0\1.8 1.3 9.2 6.1 83 
13| 14.717.4 56 0.9 56 39 
14 | =3.2 2.3 66 0.0 36 6.4 
15 | 44.3 17.7 22 6.5 1.3 0.7 


where j; and Ž; are the standardized data’: 


5 = 
BU Sg, (SA) su, (15.5) 


n sy n syp 
m k- k 
yi = ( 5 i zx) f 5 TT Eik + Syt; 


We deduce that 54 = ¥— Y}; (Sy/Sz,) Gein and 6} = (sy/s2,) Bx. When performing lasso 
regression, we always standardize the data in order to obtain comparable beta’s. Otherwise, 
the penalty function ||8||; does not make a lot of sense. In Table 15.4, we have estimated the 
lasso coefficients 3, (A) for different values of the shrinkage parameter >. When A = 0, we 
obtain the OLS estimate, and the lasso regression selects all the available variables. When 
A — 00, the solution is 3 (co) = 0, and the lasso regression selects no explanatory variables. 
In Table 15.4, we verify that the number of selected variables is a decreasing function of 
Aà. For instance, the lasso regression selects respectively four and three variables when A is 
equal to 0.9 and 2.5. It follows that the most important variable is the third one, followed 
by the first, second, fourth and fifth variables. 


In Figure 15.5, we have reported the path of the lasso estimate B (A) with respect to the 
scaling factor 7* € [0,1], which is defined as follows: 


8The notations Z, and Sz, represent the mean and the standard deviation of the data 


{zik i= EE 
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T* is equal to zero when \ —> co (no selected variable) and one when \ = 0, which corre- 
sponds to the OLS case. From this path, we verify the lasso ordering: 


T3 > T1 > T2 > T4 > T5 


TABLE 15.4: Results of the lasso regression 


A 0.0 0.9 2.5 5.5 7.5 
By (A) 0.4586 0.4022 0.3163 0.1130 
Bo (A) —0.1849 —0.2005 —0.1411 
Bg (A) 0.8336 0.7265 0.5953 0.3951 0.2462 
Ba (A) —0.1893 —0.1102 
Bs (A) 0.0931 
AON 1.7595 1.4395 1.0527 0.5081 0.2462 
RSS (3()) 0.0118 0.0304 0.1180 0.4076 0.6306 
R? 0.9874 0.9674 0.8735 0.5633 0.3244 
dp ere!) 5 4 3 2 1 


FIGURE 15.5: Variable selection with the lasso regression 


15.1.3 Score modeling, validation and follow-up 


15.1.3.1 Cross-validation approach 


In order to avoid overfitting, we can also split the dataset into a training set and a 
validation set. The training set is used to estimate the model, for example the vector 0 in 
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the case of a parametric model, while the validation set is used to compute the prediction 
error and the residual sum of squares. This approach can be generalized for model selection. 
In this case, the training set is used to fit the several models, while the validation set is 
used to select the right model (Hastie et al., 2009). We generally distinguish two types of 
cross-validation. 


1. In exhaustive cross-validation methods, learning and testing are based on all possible 
ways to divide the original sample into a training set and a validation set. For example, 
leave-p-out cross-validation (LDOCV) assumes that the validation set is composed of p 
observations, while the training set corresponds to the remaining observations. Since 
the number of training and validation sets is equal to C7’, this approach may be 
computationally intensive. In order to reduce the complexity, we can choose p = 1. 
This approach is called the leave-one-out cross-validation (LOOCV). 


2. Non-exhaustive cross-validation methods split the original sample into training and 
validation sets. For instance, the k-fold approach randomly divides the dataset into k 
(almost) equally sized subsamples. At each iteration, one subsample is choosen as a 
validation set, while the k— 1 remaining subsamples form the training set. This means 
that the model is fitted using all but the jt} group of data, and the jt? group of data 
is used for the test set. We repeat the procedure k times, in such a way that each 
subsample is tested exactly once. In the case of a linear regression, the k-fold cross 
validation error is generally computed as: 


where i € G; denotes the observations of the j'® subsample and Ê (j) the estimate 
of 8 obtained by leaving out the j** subsample. Even in simple cases, it cannot be 
guaranteed that the function Eev has a unique minimum. The simple grid search 
approach is probably the best approach. The exhaustive leave-one-out cross validation 
(LOOCV) is a particular case when k is equal to the size of the dataset. Moreover, 
we can show that LOOCV is asymptotically equivalent to the AIC criterion (Stone, 
1977). 


In order to illustrate the principle of cross-validation, we consider the ridge estimator: 
x wail T ÀT 
p = arg min 5 (Y — X£) (Y - Xb) + 56 B 


where Y isan x 1 vector, X isan x K matrix and 8 isa K x 1 vector. The ridge model 
is then a regularized linear regression model with a L2-norm penalty (Hoerl and Kennard, 
1970). It follows that the expression of 3 is equal to: 


Ê= (XTX +AIk) ` XTY 


In the case of the leave-one-out cross validation, Allen (1971, 1974) showed that the function 
Ee has an explicit expression known as the predicted residual error sum of squares (or 
PRESS) statistic: 


n 


1 2 
Press = — YEG 
Tess > (ys — Îi,—i) 


i=l 
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where §,-; is the estimate of y; based on the ridge model when leaving out the it? obser- 


vation. Indeed, we have?: 
n 


1 a 
Press = — —+*_, 
2 (1 — hi)? 


i=1 


where ti; = yi — a} B and hi = rl (X'X + Mk)" xi. With this formula, we don’t need to 
estimate the n estimators Êi, where ĝi is the ridge estimator when leaving out the it? 
observation. 


TABLE 15.5: Data of the ridge regression problem 


t Y ıı n T2 T3 T4 T5 

I|-230! -80 60 -27 95 -75 
2| —21.0, —6.5 111 54 66 67 
3| —5.0 |! —14.4 -13.3 -3.2 08 10 
4| 39.6, -6.7 260 11.5 155 65 
S| 581 23 —71 -46 TG —0.6 
6| 13.6) 20 —13.0 —13.3 -0.9 -8.6 
7| 14.01 10.7 —4.9 —23.1 25 19.0 
S| —5.2! -8.5 10 42 —11.5 129 


9 6.9 3.4 4.9 9.5 —12.8 11.0 
10 | —5.2 0.0 5.1 14.3 3.8 10.0 
11 0.0 1.0 4.0 14.1 —3.5 —23.6 
12 3.0 2.4 1.6 1.2 4.8 9.2 
13 9.2; —0.1 —10.6 16.0 7.5 5.8 
14 26.1 15.2 2.5 5.3 —18.0 10.4 


15 6.3 19.2 20.7 —5.1 3.9 —13.8 
16 11.5 10.1 1.7 —12.1 —-2.7 13.9 
17 4.8 3.8 0.8 2.7 1.0 14.4 


13.1 6.6 1.6 -7.4 —3.5 


| 
! 
| 

18 35.2 | 23.1 1.2 —5.0 —16.1 3.3 
: 
| —19.0 0.7 0.8 —2.7 11.3 


Example 165 Using the data given in Table 15.5, we consider the linear regression model: 


5 
Yi = 5 PkTi,k + Us 
k=l 


The objective is to determine the ridge parameter by cross-validation. 


In order to estimate the optimal value of A, we calculate the PRESS function and find 
its minimum: 
\* = arg min Press (A) 


In Figure 15.6, we have represented the PRESS function for several values of A. Using a 
bisection, we deduce that the optimal value is A* = 3.36. 


Remark 180 The ridge regression is a good example where we can obtain an analytical 
formula for the cross-validation error Ey. In most of statistical models, this is not the case 


°See Exercise 15.4.2 on page 1022. 
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and we have to use a grid search for selecting the optimal model. This approach may be 
time-consuming. However, since the calibration of credit scoring models is done once per 
year, it is not an issue. Nevertheless, the computational time of cross-validation may be 
prohibitive with on-the-fly or real time statistical models. 


0.295 


0.294 


0.293 


Press(A) 


0.292 


0.291 


0.290 


FIGURE 15.6: Selection of the ridge parameter using the PRESS statistic 


15.1.3.2 Score modeling 


Score modeling is the backbone of credit scoring. This is why it is extensively studied in 
the next two sections of this chapter. However, we present here some elements in order to 
understand the main challenges. The score is generally a (non-linear) function of exogenous 
variables X and parameters 0: S = f (X; 0). We assume that 6 has been already estimated by 


a statistical inference method and the model S$ = f (x : ô) has been validated. For example, 
f (x : ô) may be a ridge regression model where ĝ = (2, À) and \ has been calibrated by 


a cross-validation method. The score estimation S = f (x : 6) is the preliminary part of 


the decision rule. Indeed, we have now to decide if we select or not the applicant. It can be 
done using the following rule: 


S < s => Y = 0 = reject 
S > s => Y = 1 = = accept 


The difficulty lies in the choice of the cut-off s. For instance, if the model is a logit model, 
the score is a probability between 0 and 1: 


Pr{Y = 1} = Pr {having a good risk} = f (x; ô) 
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At first sight, a natural cut-off is s = 50%: 


S < 50% => Y = 0 => reject 
S > 50% = Y = 1 = accept 


However, we will see in Section 15.3 on page 1008 that s = 50% is not necessarily the 
optimal cut-off, in particular when the population is heterogenous!?. Moreover, the decision 
rule may be influenced by other factors that are not driven by s statistical point of view. 
For example, if the loss associated to the selection of a bad risk is larger than the gain 
associated to the selection of a bad risk, the optimal cut-off may be larger than 50%. 


15.1.3.3 Score follow-up 


Once we have built a scoring system, we begin to collect new information about the 
selected applicants. We can then backtest the score in order to check its robustness. Let 
us consider a rating system, whose annual probability of default is given by the following 


table: 
Rating A B C D E F 


Probability of default 0.5% 1% 2% 5% 15% 25% 


Each year, we can calculate for each grade the default frequency and adjust the decision 
rule in order to obtain a coherent scoring system. Below, we have reported two examples of 
default frequencies: 


Rating A B C D E F 
Year 1 0.05% 2.3% 2.8% 7.5% 22.6% 35.1% 
Year2 0.5% 2.7% 1.3% 2.0% 15.1% 25.1% 


It is obvious that Year 2 produces closer figures to the expected result than Year 1. How- 
ever, Year 2 raises more concerns than Year 1 in terms of coherency. Indeed, the default 
frequencies are not increasing between ratings B, C and D. On the contrary, we observe a 
coherent ranking for Year 1, which faces an average default rate larger than predicted. 

Besides the coherency issue, the stability of the scoring system is another important key 
element of the follow-up. Two axes of analysis can be conducted. The first one concerns 
the structure of the population with respect to the score. In the table below, we report the 
observed frequencies of each class: 


Rating A B C D E F 

Year0 25% 20% 20% 20% 10% 5% 
Year 1 15% 20% 25% 17% 13% 10% 
Year 2 15% 15% 30% 15% 15% 10% 


We notice a change in the population distribution, implying that the original scoring system 
may be no longer valid. The second axis of analysis concerns the exogenous variables that 
compose the score. In this case, the analysis consists in comparing the structure of the 
population with respect to each variable. 


Another issue is the status of the rejected applicants. Indeed, there is an asymmetry 
between applicants that are accepted for credit and the others. We know what accepted 
applicants will become in terms of good/bad risk, but we don’t know what the good/bad 
status of rejected applicants would have been (Hand and Henley, 1997): 


10For example when the number of good risks is larger than the number of bad risks. 
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“The behavior of those who have been rejected, if instead they had been ac- 
cepted, is unknown. If one estimates a model using data only on accepted appli- 
cants, those estimated parameters may be biased when applied to all applicants. 
In addition, if cut-off scores are chosen to equalize the actual and predicted num- 
ber of defaulting applicants then a sample of accepted applicants is likely to yield 
inappropriate cut-offs for the population of all applicants“ (Crook and Banasik, 
2004, page 857). 


The statistical study of these rejected applicants is called ‘reject inference’ and can be viewed 
as a missing data problem! (Little and Rubin, 2014). Except when selected and rejected 
populations are perfectly coherent with the scoring decision rule, the fact that we do not 
observe the rejected population introduces a bias. Let us consider the example of a tight 
decision rule implying that we never observe a bad risk. It is obvious that the calibrated 
statistical model does not reflect the entire population, but only the selected population. The 
issue is even high because a credit scoring model does not reduce to a statistical problem, 
but it is used from a business point of view. Questions about the market share and the 
other competitors are also essential. We have reported below an illustration: 


Choice Number of Default Total profit Per-unit profit 
selected applicants rate (in $ mn) (in $) 
#1 1 000 000 5% 100 100 
#2 2 000 000 7% 150 75 
#3 5 000 000 10% 180 36 


What is the optimal choice? If the goal is to minimize the default rate, the best choice is 
#1. If the goal is to maximize the total profit, the third choice is optimal. There are several 
statistical approaches to perform reject inference (extrapolation, augmentation, reweighting, 
reclassification, etc.). However, they are not satisfactory because they focus on the default 
rate and ignore business issues. Nevertheless, they can help to test if the credit scoring 
model is biased (Banasik and Crook, 2007). 


15.2 Statistical methods 


Unsupervised learning is a branch of statistical learning, where test data does not in- 
clude a response variable. It is opposed to supervised learning, whose goal is to predict 
the value of the response variable Y given a set of explanatory variables X. In the case of 
unsupervised learning, we only know the X-values, because the Y-values do not exist or are 
not observed. Supervised and unsupervised learning are also called ‘learning with/without 
a teacher’ (Hastie et al., 2009). This metaphor means that we have access to the correct 
answer provided by the supervisor (or the teacher) in supervised learning. In the case of 
unsupervised learning, we have no feedback on the correct answer. For instance, the linear 
regression is a typical supervised learning model, whereas the principal component analysis 
is an approach of unsupervised learning. 


11 We generally distinguish three types of missing value problems: missing completely at random or MCAR, 
missing at random or MAR, and missing not at random or MNAR. Credit scoring models generally face 
MAR or MNAR situation. 
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15.2.1 Unsupervised learning 


In the following paragraphs, we focus on cluster analysis and dimension reduction, which 
are two unsupervised approaches for detecting commonalities in data. 


15.2.1.1 Clustering 


Cluster analysis is a method for the assignment of observations into groups or clusters. 
It is then an exploratory data analysis which allows to group similar observations together. 
As a result, the objective of clustering methods is to maximize the proximity between 
observations of a same cluster and to maximize the dissimilarity between observations which 
belong to different clusters. In what follows, we consider two popular cluster methods: K- 
means and hierarchical clustering. 


K-means clustering It is a special case of combinatorial algorithms. This kind of al- 
gorithm does not use a probability distribution but works directly on observed data. We 


consider n observations with K attributes 7; (i =1,...,n and k = 1,..., K). We note zi 
the K x 1 vector (%,1,...,2i,«,). We would like to build nc clusters C; defined by the index 
j where j = 1,...,nc with the following properties: 


1. clusters must be disjoint: Cj N Cj = Q for j 4 7’; 
2. clusters must describe the entire dataset: C1 U CoU---U Cno = {1,...,n}; 
3. observations assigned to a cluster are statistically similar. 


Let C be the mapping function which permits to assign an observation to a cluster, meaning 
that C (i) = j assigns the it observation to the j'® cluster C; — j is also called the corre- 
sponding label. The principle of combinatorial algorithms is to adjust the mapping function 
C in order to minimize the following loss function (Hastie et al., 2009): 


no 
OILE E dez) 
J=1C(i)=j C(i')=j 


where d (xi, zy) is the dissimilarity measure between the observations 7 and i’. As a result, 
the optimal mapping function is denoted C* = arg min £ (C). 

In the case of the K-means algorithm, the dissimilarity measure is the Frobenius distance 
(or Euclidean norm): 


K 
d (zi, av) = >> (tik — tik) = |i — i h? 
k=1 


Therefore, the loss function becomes!” 


no 
C)=S n; D> le- zl? 
j=1 


C(i)=j 


12T Exercise 15.4.3 on page 1023, we show that: 


D DY len eel? = SD mle- al 


C(i)=j CV) = 5 C(i)=j 
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where 2; 1j>+»@K,j) is the (K x 1) mean vector associated with the j*" cluster and 


nj = doy, 1 {C (i) = j} is the corresponding number of observations. If we note u} = 


arg min egz; llti — By \|?, the previous minimization problem is equivalent to: 


no 
{C*, uý,- Hho} Sars min X ny 5 lz: — ull? 
j=l cO 


where puj is called the centroid of cluster C;. This minimization problem may be solved by 
the Lloyd’s iterative algorithm: 


1. we initialize cluster centroids po, ened ur). 


2. at the iteration s, we update the mapping function C“) using the following rule: 


2 
c^) (i) = arg min le — pe || 
j 


3. we then compute the optimal centroids of the clusters { p, ee poh: 
1 
Ta T Ti 
Nj : g 
c) (i)=j 


4. we repeat steps 2 and 3 until convergence, that is when the assignments do not change: 
Cae ale, 


We can show that the algorithm converges to a local minimum, implying that the main issue 
is to determine if the solution is also a global minimum. The answer depends on the initial 
choice of centroids. Generally, the algorithm is initialized with random centroids. In this 
case, we can run the algorithm many times and choose the clusters that give the smallest 
value of the function £ (C). We also notice that the number of clusters is an hyperparameter 
of the clustering model’. This implies that we have to test different values of ng in order 
to find the ‘optimal’ partition. 


TABLE 15.6: Data of the clustering problem 


X X% X% X 
17.6 19.6 19.8 20.4 28.8 
13.2 17.5 17.5 17.4 24.2 
35.9 25.4 32.4 25.0 40.7 
28.1 24.0 25.1 28.7 26.7 
23.5 23.6 23.7 14.3 18.1 
30.3 29.5 32.0 29.5 
14.0 23.9 18.3 19.2 17.2 
36.7 29.0 30.3 21.1 28.7 
31.2 19.4 29.9 33.3 23.8 
17.0 20.5 23.8 16.0 19.7 


>. 


SOON OOKRWNH 
w 
jor) 
OU 


jat 


13Originally, the K-means method defines K clusters by their means (or centroids). In this book, K is 
the number of explanatory variables. This is why we prefer to use the notation j for cluster labeling, while 
nc represents the number of classes. 
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TABLE 15.7: Optimal centroids yj for 2 and 3 clusters 


Ls Xı Xo X3 X4 Xs 
no = 2 
uï 17.06 21.02 20.62 17.46 21.60 
u3 33.68 25.62 29.44 28.02 29.88 
o neas, 

by 17.06 21.02 20.62 17.46 21.60 
fy 36.37 28.23 30.73 26.03 32.97 
u3 29.65 21.70 27.50 31.00 25.25 


Example 166 We consider the clustering problem of 10 observations with five variables 
Xı to Xs. The data are reported in Table 15.6. We would like to know if two clusters are 
sufficient or if we need more clusters to analyze the similarity. 


By setting nc equal to 2, we obtain the following optimal clustering: Cf = {1,2, 5,7,10} 
and CX} = {3,4,6,8,9}. Optimal centroids are reported in Table 15.7. It follows that 
L(Ci,CZ) = 3390.32. In the case ng = 3, the optimal clustering becomes Cf = 
{1,2,5,7, 10}, C} = {3,6,8} and C3 = {4,9} while the loss function £ (CT, C3,C3Z) is equal 
to 1832.94. We notice that the K-means algorithm has split the second cluster into two 
new clusters. The loss function does not help to determine the optimal number of clusters, 
because £ (C) tends to zero when nc increases. The most popular approach is the Elbow 
method, which consists in drawing the percentage of variance explained as a function of 
the number of clusters and detecting when the marginal gain is small. However, there is no 
good solution to estimate nc, because they generally overestimate the number of clusters!*. 
This is why it is better to fix the minimum number of observations by cluster. It is obvious 
that two clusters are sufficient in our example, because nc = 3 leads to having a cluster 
with only two observations. 


Hierarchical clustering The K-means clustering method presents several weak points. 
First, it requires many iterations when the number of observations and the number of 
clusters are large. Second, the solution highly depends on the cluster initialization, implying 
that we need to run many times the Lloyd’s algorithm in order to find the optimal clustering. 
Third, the number of clusters is definitively an issue. 


The idea of hierarchical clustering is to create a tree structure in order to model the 
relationships between the different clusters. Unlike the K-means algorithm, this algorithm 
does not depend on the number of clusters or the initialization assignment. However, it 
depends on the dissimilarity measure between two clusters. In Figure 15.7, we have repre- 
sented an example of tree structure (or dendrogram) obtained by hierarchical clustering. 
The 1% and 3°¢ observations are grouped in order to obtain a first cluster. This cluster is 
then merged with the 5t? observation in order to define a new cluster. In a similar way, the 
6 and 7t? observations are grouped to obtain a first cluster. This cluster is then merged 
with the 10°" observation in order to define a new cluster. The tree structure indicates how 
two clusters are merged into a new single cluster. The lowest level of the tree corresponds to 
the individual observations. In this case, each cluster contains one observation. The highest 
level of the tree corresponds to the entire dataset. In this case, there is only one cluster that 
contains all the observations. 


14For instance, we obtain Cy = {1,2}, Cl = {5,7, 10}, C$ = {3,6,8} and Cy = {4,9} when ng = 4. 


Credit Scoring Models 947 


L L 1 L 
=0.1 0:3 0.3 0.5 OF 0.9 


Dissimilarity measure 


FIGURE 15.7: An example of dendrogram 


Remark 181 There is a difference between a basic tree and a dendrogram. Indeed, the x- 
axis of the dendrogram corresponds to the dissimilarity measure. Therefore, we can easily 
see which merge creates small or large dissimilarity. 


We generally distinguish two approaches for hierarchical clustering: 


e in the agglomerative method (also called bottom-up clustering), the algorithm starts 
with the individual clusters and recursively merge the closest pairs of clusters into one 
single cluster; 


e in the divisive method (also called the top-down clustering), the algorithm starts with 
the single cluster containing all the observations and recursively splits a cluster into 
two new clusters, which present the maximum dissimilarity. 


Let C; and Cj be two clusters. The objective function of the agglomerative method is to 
minimize the dissimilarity measure D (C;,C,;/) while we maximize the dissimilarity measure 
D(C;,C;’) in the divisive method. In what follows, we only consider the agglomerative 
method, because it is more efficient in terms of computational time and it is more widespread 
used. 


The dissimilarity measure D (C;,C;/) is defined as a linkage function of pairwise dissim- 
ilarities d (xz;, xy) where C (i) = j and C (i') = j’. Therefore, the agglomerative method 
requires defining two dissimilarity measures: the linkage function between two clusters 
D(C;,C;’) and the distance between two observations d (x4, xų). For this last one, we gen- 
erally consider the Mahalanobis distance: 


d (xi, £i) = Vai = ay)! $ (z; = zy) 
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where È is the sample covariance matrix or the Minkowski distance: 


1/p 
d (£i, £i) a = ($e sna") 


where p > 1. The case p = 2 corresponds to the Euclidean distance. For the linkage 
function, we generally consider three approaches. The single linkage (or nearest neighbor) 
is the smallest distance between the clusters: 


C Cy min d Ti, Ly! 
a {C(i)=5,C(i)=3'} ( ) 


The complete linkage (or furthest neighbor) is the largest distance between the clusters: 


C „Cj max d Ti, Ti 
P= {ea)=je(i='} ( ) 


Finally, the average linkage is the average distance between the clusters: 


5 5 d (xj, £i) 


D (Cj, Cj) 
nyy e= eae 


At each iteration, we search the clusters j and j’ which minimize the dissimilarity measure 
and we merge them into one single cluster. When we have merged all the observations 
into one single cluster, the algorithm is stopped. It is also easy to perform a segmentation 
by considering a particular level of the tree. Indeed, we notice that the algorithm exactly 
requires n — 1 iterations. The level L‘*) = s is then associated to the s‘" iteration and we 
note DS) = D (C;+,Cj) the minimum value of D (Cj, Cy’). 

In Figure 15.7, the dendrogram was based on simulated data using the single linkage rule 
and the Euclidean distance. We have considered 10 observations divided into two groups. 
The attributes of the first (resp. second) one correspond to simulated Gaussian variates 
with a mean of 20% (resp. 30%) and a standard deviation of 5% (resp. 5%). The intra- 
group cross-correlation is set to 80% whereas the inter-group correlation is equal to 0%. 
We obtain satisfactory results. Indeed, if we would like to consider two clusters, the first 
cluster is composed of the first five observations, whereas the second cluster is composed 
of the last five observations. In practice, hierarchical clustering may produce concentrated 
segmentation as illustrated in Figure 15.8. We use the same simulated data as previously 
except that the standard deviation for the second group is set to 25%. In this case, if we 
would like to consider two clusters, we obtain a cluster with 9 elements and another cluster 
with only one element (the 6° observation). 


Let us consider Example 166 on page 946. By using the Euclidean distance, we obtain 
the dendrograms in Figure 15.9. If we would like to split the data into two clusters, we find 
for the three methods the solution {1,2,5,7,10} and {3,4,6,8,9}, which also corresponds 
to the solution given by the k-means analysis. In the case of the single linkage method, we 
have reported in Table 15.8 for each level L“) the distance D“), the two nearest neighbours 
i* and i/* and the created cluster C‘). We notice that the solution for 3 clusters differs 
from the K-means solution. Indeed, we find {1,2,5, 7,10}, {4,6,8,9} and {3} for the single 
linkage method. 


15.2.1.2 Dimension reduction 


We now turn to the concept of dimension reduction, which consists in finding some 
common patterns in order to better explain the data. For instance, we might want to reduce 
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FIGURE 15.8: Unbalanced clustering 


TABLE 15.8: Agglomerative hierarchical clustering (single linkage) 


LS) D®) (i*, i*) fae) 
1 7.571 (5,10) {5, 10} 
2 7.695 (1,2) {1,2} 
3 8.204 (5,7) {5,7,10} 
4 9.131 (4,9) {4,9} 
5 9.238 (1,5) {1,2,5,7, 10} 
6 11.037 (6,8) {6,8} 
7 12.179 (4,6) {4, 6,8, 9} 
8 13.312 (3,4) {3,4,6, 8, 9} 
9 15.199 (1,3) {1,2,3,4,5,6,7,8,9, 10} 
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FIGURE 15.9: Comparison of the three dendrograms 


a dataset with 1000 variables to the two or three most important patterns. In machine 
learning, dimension reduction is also known as feature extraction, which is defined as the 
process to build new variables or features that are more informative and less redundant 
than the original variables. 


Principal component analysis Let X be a kK x 1 random vector, whose covariance 
matrix is equal to ©. We consider the linear transform Z = B! X where B = ((1,..., Gx) 
is a K x K matrix and the ’s are K x 1 vectors. The j element of Z is denoted by Zj 
and we have Z; = B] X a Sram PrjXk- Zj is also called the j*» principal component. The 
idea of PCA is to find a first linear function 6; X such that the variance of Z, is maximum 
and then a j*® linear function pi X such that the variance of Z; is maximum and Z; is 
uncorrelated with Z1,...,Zj—1 for all 7 > 2 (Jolliffe, 2002). We can show that B is the 
matrix of eigenvectors!’ of the covariance matrix ©: 


EB = BA 


where A = diag (\1,..., Ag) is the diagonal matrix of eigenvalues with \j > A2 >--- > AK. 
Since X is a symmetric and positive define matrix, we also have: 


£= BAB™! = BAB!" 


15See Exercise 15.4.4 on page 1024 for the derivation of this result. 
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and B is an orthonormal matrix. By construction, we have!®: 
var (Zj) = var (8) X) 
= B28; 
= Ajj 
Àj 
and: 
cov (Zj, Zy) = Bj UB; 
= Ajy 
0 


We deduce the spectral decomposition of the covariance matrix: 
K 
= BAB’ = 5° \;6;6] 
j=l 


We note Ba:j) and B(g-j+1:x) the matrices that contains the first and last j columns of 
B. We consider the random vector Z = (Z ije8e9 Z;) = B! X of dimension j. Here are some 
properties of the PCA (Jolliffe, 2002): 


1. the trace of cov (Z ) is maximized if B = Ba.;) corresponds to the j first eigenvectors; 


2. the trace of cov (Z) is minimized if B = Bcx—j+1:K) corresponds to the j last eigen- 
vectors; 


3. the covariance of X given Z is: 


cov (X | Z1,...,2)) = Exx —UygEz'Uz x 
K 
= SO rg PeBe 
k=j+1 


4. we consider the following linear regression model: 


X=AZ+U 
where A is a K x j matrix and U = (Wj,...,UxK) is the vector of residuals; if we 
note Q = diag (o2, atts ox) the covariance matrix of U, the trace of Q is minimized if 


B= Baz). 


Remark 182 The principal component analysis can be performed with correlation matrices 
instead of covariance matrices. Jolliffe (2002) presents different arguments for justifying this 
choice. Indeed, PCA makes more sense when the variables are comparable. Otherwise, the 
principal components are dominated by the variables with the largest variances. 


16 Because of the following equality: 


A= BtB = B' EB 
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Example 167 We consider the random vector X = (X1,X2,X3,X4), whose individual 
variances are equal to 1, 2, 3 and 4. The correlation matrix is: 


1.00 

= | 0.30 1.00 

P= | 0.50 0.10 1.00 
0.20 0.50 —0.50 1.00 


We have reported the eigendecomposition of © and p in Tables 15.9 and 15.10. We 
observe some differences. For instance, the first principal component of the covariance matrix 
X is: 

Zı = 0.18- Xı + 0.33 - Xə + 0.53 - X3 + 0.76 + X4 


whereas the first principal component of the correlation matrix p is: 


Za = 0.48 - Xı + 0.44 - Xo + 0.53 - X3 + 0.55 - X4 


We verify that the sum of eigenvalues is equal to the sum of variances for the covariance 
matrix, and the number of variables for the correlation matrix”. 


TABLE 15.9: Eigendecomposition of the covariance matrix 


Pr Bo B3 Ba 
Xı 018 —0.20 —0.57 0.77 
Xə 0.33 0.58 —0.63 —0.40 
X3 0.53 —0.73 —0.13 —0.41 
X, 0.76 0.31 0.50 0.27 
Aj 5.92 2.31 1.31 0.46— 


TABLE 15.10: Eigendecomposition of the correlation matrix 


Br p2 B3 Ba 
Xı 048 —0.44 —0.65 —0.40 
Xə 044 0.67 —0.40 0.45 
X3 0.53 —0.51 0.38 0.57 
X4 0.55 0.33 0.53 —0.56 
dj 206 0.97 0.73 0.23 


We now develop the interpretation tools of PCA. The quality of representation is defined 
as the percentage of total variance that is explained by the jt? principal component (or PC): 


__ A; 
Da Ak 


We have 0 < Q; < 1. The cumulative quality of representation is just the cumulative sum 
of the quality values: 


Q; 


1TWe have trace (©) = D o? and trace (BAB-') = trace (AB-1B) = trace (A) = pee Aj. For the 


correlation matrix, we deduce that pean Aj =K. 
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Q% is also called the quality of representation of the jt? principal planetë. The correlation 
between the variable Xx and the factor Z; is given by: 


cor (Xk, Zj) = EV Àj 
It follows that the quality of representation of the variable X; with respect to the jt” PC 
is1°: 
Qk,j = cor? (Xr, Zj) = Begs 
We can also define the contribution of the variable X;, to the jt” PC?: 


2 
Chg = Bret 


In order to the understand the association between variables, we generally plot the cor- 
relation circle between two principal components that corresponds to the scatterplot of 
cor (Xp, Zj) and cor (Xp, Zj). 


Remark 183 In practice, we estimate the covariance or the correlation matrix using a 
sample. Let xi = (£i1,..., £i,x) be the i observation. We note Zij = bj zi the projection 
of xi onto the j* principal component. The quality of representation and the contribution 
of an observation to a principal component are then equal to: 


Qj = 
2, 
’ A zi 
and: i 
FA 
Cy; H 


We consider again Example 166 on page 946. In Table 15.11, we have reported the results 
of the PCA applied to the correlation matrix of data. The first PC explains 68.35% of the 
variance, while the quality of representation of the second PC is equal to 14.54%. This means 
that we can explain 82.89% with only two factors. For the first factor, each variable has a 
positive loading. This is not the case of the second factor, where the factor loadings of Xj, 
Xə and X3 are negative. We notice that Xı and X3 are well represented by Z, (95.81% and 
86.95%). For the second PC, the second variable Xə is the most represented (41.40%). If we 
consider the last PC, the quality of representation is poor (less than 1%). This indicates that 
the last PC has a very low explanation power. We notice that the rationale of the fourth 
PC is to model X3 because the second and third PCs do not explain this variable. The 
contribution values C;,; are also interesting to confirm the previous results. For instance, 
Xı does not contribute to Z2. It follows that the second PC represents the opposition of X2 


18That is the plane composed of the first j principal components. 
19We verify that the sum of Qx,j is equal to the variance of the gt? PC: 


K K K 

2 2 
X Ory = So eR as =àj 5 Pk j = AG 
k=1 k=1 k=1 


20 We verify that the sum of Ck; is equal to 100%: 


K K 
dCs =D Pha = 1 
k=1 k=1 
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TABLE 15.11: Principal component analysis of Example 166 


Factor Zı Z2 Z3 Z4 Zs 
Àj 3.4173 0.7271 0.5548 0.2783 0.0226 
Q; 68.35% 14.54% 11.10% 5.57% 0.45% 
Q; 68.35% 82.89% 93.98% 99.55% 100.00% 
Matrix B of eigenvectors 
Xı 0.5295 —0.1015 —0.0567 —0.2554 0.8006 
Xo 0.3894 —0.7546 —0.0500 0.4855 —0.2019 
X3 0.5044 —0.0188 —0.0247 —0.6650 —0.5499 
X4 0.3952 0.5318 —0.6238 0.3995 —0.1107 
Xs 0.3967 0.3702 0.7775 0.3120 —0.0609 
Correlation between X; and Zj 
Xı 97.88% -8.66%  —4.22% —13.47% 12.03% © 
Xə 71.98% —64.35%  —3.72% 25.61% —3.03% 
X3 93.25%  —1.60%  —1.84% —35.08% —8.27% 
X4 73.06% 45.35% —46.46% 21.07% —1.66% 
Xs 73.34% 31.57% 57.91% 16.46% —0.92% 
Quality of representation of each variable Qk j 
Xı 95.81% 0.75% 018% 1.82% 145% 
Xə 51.81% 41.40% 0.14% 6.56% 0.09% 
X3 86.95% 0.03% 0.03% 12.31% 0.68% 
X4 53.38% 20.57% 21.59% 4.44% 0.03% 
Xs 53.78% 9.96% 33.54% 2.71% 0.01% 
Contribution of each variable Cx, ; 
X, 28.04% 1.03% 0.32% 6.52% 64.09% — 
Xə 15.16% 56.94% 0.25% 23.57% 4.08% 
X3 25.44% 0.04% 0.06% 44.22% 30.24% 
X4 15.62% 28.29% 38.91% 15.96% 1.23% 
Xs 15.74% 13.70% 60.46% 9.73% 0.37% 


with respect to X4 and Xs. Clearly, the third PC mainly concerns X4 and X;. Figure 15.10 
represents the scatterplot of the factor values z;,; for the first two principal components. 
We notice that the second component classifies the observations in the same way than the 
K-means algorithm or the agglomerative hierarchical clustering. Indeed, we retrieve the 
two clusters {1,2,5,7,10} and {3,4,6,8,9}. This is not the case of the first component, 
which operates the following classification {1,2,3,4,9} and {5,6,7,8, 10}. In Figure 15.11, 
we have reported the correlation circle between different PCs. If we consider the first two 
PCs, the variables X; and Xə are clearly opposed to the variables X4 and X5. The second 
panel confirms the competition between X4 and X; due to the third PC. 


Non-negative matrix factorization There are several alternative approaches to princi- 
pal component analysis. For instance, independent component analysis (ICA) estimates ad- 
ditive factors that are maximally independent. Another popular method is the non-negative 
matrix factorization (NMF). Let A be a non-negative matrix m x p. We define the NMF 
decomposition of A as follows: 

Ax BC 


where B and C are two non-negative matrices with respective dimensions m x n and n x p. 
Compared to classic decomposition algorithms, we remark that BC is an approximation 
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FIGURE 15.10: Scatterplot of the factor values z; and z;,2 


15t pc / 294 pC 2d pc / 3 PC 


FIGURE 15.11: PCA correlation circle 
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of A. There are also different ways to obtain this approximation meaning that B and C 
are not necessarily unique?!. We also notice that the decomposition A ~ BC is equivalent 
to A! ~ C'B'. It means that the storage of the data is not important. Rows of A may 
represent either the observations or the variables, but the interpretation of the B and C 
matrices depend on the choice of the storage. We remark that: 


n 
Aij = >> BinCr,s 
k=1 


Suppose that we consider a variable/observation storage. Therefore, Bi, depends on the 
variable ¿ whereas Ck į depends on the observation j. In this case, we may interpret B as 
a matrix of weights. In factor analysis, B is called the loading matrix and C is the factor 
matrix. B; x is then the weight of factor k for variable i and Ci; is the value taken by factor 
k for observation j. If we use an observation/variable storage which is the common way to 
store data in statistics, B and C become the factor matrix and the loading matrix. 


Because the dimensions m, n and p may be very large, one of the difficulties with NMF 
is to derive a numerical algorithm with a reasonable computational time. Lee and Seung 
(1999) developed a simple algorithm with strong performance and applied it to pattern 
recognition with success. Since this seminal work, this algorithm has been improved and 
there are today several ways to obtain a non-negative matrix factorization. In order to find 
the approximate factorization, we need to define the loss function £ which measures the 
quality of the factorization. The optimization program is then: 


{B*,C*} = argmin£(A, BC) (15.6) 


B>O 
u.c. c ie 0 


Lee and Seung (2001) considered two loss functions. The first one is the Frobenious norm: 


£(A, BC) = SE (Ai - (BC),;). 


i=l j=l 
whereas the second one is Kullback-Leibler divergence: 


a Aij 
L(4,BO => (asm BO) Aij + (BC); ,) 


i=1 j=1 


To solve Problem (15.6), Lee and Seung (2001) proposed to use the multiplicative update 
algorithm. Let B(s) and C(,) be the matrices at iteration s. For the Frobenious norm’, we 


have: 
Bor) = Be © (ACI) 2 (Boy) 
C(s41) = Cis) © (B b+) @ (Bliy8 (s+1)C(s) 


21Let D be a nonnegative matrix such that D~! is nonnegative too. For example, D may be a permutation 
of a diagonal matrix. In this case, we have: 


Ax BDDC x B'O' 
where B’ = BD~! and C’ = DC are two nonnegative matrices. This shows that the decomposition is not 


unique. 
22A similar algorithm may be derived for the Kullback-Leibler divergence. 
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where © and © are respectively the element-wise multiplication and division operators. 
Under some assumption, we may show that B* = Bi.) and C* = Cœ), meaning that the 
multiplicative update algorithm converges to the optimal solution. 


For large datasets, the computational time to find the optimal solution may be large 
with the previous algorithm. Since the seminal work of Lee and Seung, a lot of methods have 
also been proposed to improve the multiplicative update algorithm and speed the converge. 
Among these methods, we may mention the algorithm developed by Lin (2007), which is 
based on the alternating non-negative least squares: 


Bos41) = arg min £ A, BO sy) (15.7) 
C(s41) = arg min £ A, Bis+1)C) 
with the constraints B(s+1) > 0 and C(.41) > 0. We notice that the two optimization 
problems (15.7) are symmetric because we may cast the first problem in the form of the 


T 
(s+1 


optimization problem: 


second problem: B ) = arg min L (47, Ci B) So, we may only focus on the following 


C* = argmin£(A, BC) 
uc. C>0 
In the case of the Frobenious norm, we have 0c £ (A, BC) = 2B' (BC — A). The projected 
gradient method consists in the following iterating scheme: 


 ƏL(A, BC) 


Cec 3C 


where a is the descent length. Let (3,7) be two scalars in ]0,1[. Instead of finding the 
optimal value of a at each iteration, Lin (2007) proposed to update a in a very simple way 
depending on the inequality equation: 


ƏL (A, BC) 
ac 


` 
: lem Tt L(A, BC) x 
(C-C)+5(C-C) —aapar ( 


a=) C) <0 
where Č is the update of C. If this inequality equation is verified, a is increased (a + a/£), 
otherwise a is decreased (a + af). 


Remark 184 The choice of Bio) and Cg) for initializing NMF algorithms is important. 
The random method consists in generating matrices with positive random numbers’. An- 
other popular approach is the non-negative double singular value decomposition, which is a 
modification of the singular value decomposition by considering only the non-negative part 
of the singular values (Boutsidis and Gallopoulos, 2008). 


In order to understand why NMF is different from other factor methods, we consider a 
simulation study. We consider a basket of four financial assets. The asset prices are driven by 
a multidimensional geometric Brownian motion. The drift parameter is equal to 5% whereas 
the diffusion parameter is 20%. The cross-correlation p;,; between assets i and j is equal to 
20%, but p1,2 = 70% and p3,4 = 50%. In order to preserve the time homogeneity, the data 
correspond to z; + = ln S; where S; + is the price of the asset i at time t. In Figure 15.12, 
we report the time series x; for the four assets (panel 1) and, the first factor estimated 


?3For example, we can use the probability distributions U9 ,1) or |W (0, 1)]. 
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Asset prices 


m i 2 3 4 5 


t (in years) 
First factor (NMF) First factor (PCA) 

6.0 
55 
5.0 
4.5 
4.0 . . . . , 

0 1 2 3 4 5 


t (in years) t (in years) 


FIGURE 15.12: Estimating the first factor of a basket of financial assets 


by NMF”4 (panel 2) and PCA (panel 3). We notice that the NMF factor?” is not scaled in 
the same way than the PCA factor. However, the correlation between the first differences 
is equal to 98.8%. In the first panel in Figure 15.13, we compare the decomposition of the 
variance according to the factors. We notice that PCA explains more variance than NMF 
for a given number of factors. We obtain this result because NMF may be viewed as a 
constrained principal component analysis with nonnegative matrices. However, it does not 
mean that each PCA factor explains more variance than the corresponding NMF factor. 
For example, the second NMF factor explains more variance than the second PCA factor 
in Figure 15.13. In the other panels, we compare the dynamics of the first asset with the 
dynamics given by the NMF factors?°. With three risk factors, the reconstructed signal has 
a correlation of 93.7% with the original signal. 


15.2.2 Parametric supervised methods 
15.2.2.1 Discriminant analysis 


Discriminant analysis was first developed by Fisher (1936). This approach is close to the 
principal component analysis (PCA) and is used to predict class membership for indepen- 
dent variables. For that, we assume that we have nc disjoint classes Cj where j = 1,..., J. 
Discriminant analysis consists then in assigning an observation to one and only one class. 


24The NMF decomposition corresponds to: 


nx x% B >: C 
s aS eA 
nx Xnp NXXNFENFXNT 


where nx is the number of time-series, nr is the number of dates and np is the number of NMF factors. 
25In this example, B is the loading matrix while C is the matrix of time-series factors. 
26The reconstructed multidimensional signal is just the matrix product BC for different values of np. 
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FIGURE 15.13: Variance decomposition and signal reconstruction 
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We consider an input vector x and we divide the input space into nc decision regions, 
whose boundaries are called decision boundaries (Bishop, 2006). Classification methods can 
then be seen as a supervised clustering methods, where the categorical response variable is 
directly the class. For example, Figure 15.14 corresponds to a classification problem with 
seven classes and two explanatory variables Xı and X2. The goal is then to predict for 
each observation its class. For instance, we would like that the model predicts that the first 
observation belongs to the first class, the second observation belongs to the fifth class, etc. 


The two-dimensional case Using the Bayes theorem, we have: 
Pr{AnB} = Pr{A|B}-Pr{B} 
= Pr{B| A}-Pr{A} 
It follows that: 


Pr {A} 
Pr{A|B}=Pr{B| A}. 
{4| B} =Pr{B | A} Seay 
If we apply this result to the conditional probability Pr {i € C1 | X = z}, we obtain: 
Pr {i = Ci} 


Pr{iecC, |X =a}=Pr{xX =x |i Ci pe Ox = at 


The log-probability ratio is then equal to: 
Pr{iec, | X =a} i Pr{X =x|icC} Pr{iecy} 
n — 5 
Pr {ie Co |X =a} Pr{X =x|ie€Co} Pr{i € C2} 
nE rict} mE UEG} 
Pr{X =2|i€Co} Pr {i € Co} 
fi (2) Tı 
n +ln — 
fa (x) T2 


= A 


= | 
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FIGURE 15.14: Classification statistical problem 


where 7; = Pr {i € C;} is the probability of the j" class and f; (x) = Pr{X =x | i € Cj} is 
the conditional probability density function of X. By construction, the decision boundary 
is defined such that we are indifferent to an assignment rule (i € Cı and i € C2), implying 
that: 


1 
Pr{ieC, | X = x} = Pr {i € C2 | X = r} = J 
Finally, we deduce that the decision boundary satisfies the following equation: 


fa (2) Tm 
aT Da =0 


ln 


If we model each class density as a multivariate normal distribution: 
X |icCj ~N (uj, £j) 


we have: 
1 


(27) DAKA 


f; (£) = 


We deduce that: 


exp (-3@- m) By" em) 


fi (x) 1, [2| 1 -1 
ln = ln £ Z (z— + 
f (x) 2 [=| 2 ( ua) ‘I ( pı) 
1 = 
5 (a — p2)" Xz" (£ — pa) 
The decision boundary is then given by: 
1, [So] 1 Ta 
l x = 
2 n £l 2 (x H1) 1 (x Hı) + 


1 = T 
5 (z= ma)" De | (a — u2) P = 0 (15.8) 
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Since the decision boundary is quadratic in x, such approach is called quadratic discriminant 
analysis (QDA). 


If we assume that 4; = No = U, Equation (15.8) becomes: 


1 1 T 
= (z= pm)’ E (z — pw) — 5 (£ — m)" E7 (£ — m) + m= = 0 
2 2 A 
or: 1 
= = T 
Ua- ig 5 (a) ln = (15.9) 


It follows that the decision boundary is then linear in x. This is why we called this approach 
the linear discriminant analysis (LDA). 


Example 168 We consider two classes and two explanatory variables X = (X1, X2) where 
mı = 50%, m2 = 1 — mı = 50%, mı = (1,3), u2 = (4,1), Sa = Ig and Sg = ylz where 
y=1.5. 


FIGURE 15.15: Boundary decision of discriminant analysis 


By solving Equations (15.8) and (15.9), we obtain the QDA and LDA decision bound- 
aries?” reported in Figure 15.15. We verify that the LDA decision boundary is linear while 
the QDA decision region is convex. For each class, we have also simulated 50 realizations. We 
observe that the discriminant analysis performs the right classification most of the times. 
However, we notice that two observations from class C,; and one observation from class C2 
are not properly classified. In Figure 15.16, we analyze the impact of the parameters on 
the decision boundary. The top/left panel corresponds to the previous example, whereas we 
only change one parameter for each other panel. For instance, we increase the variance of 
the second variable in the top/right panel. We observe that the impact on the LDA decision 


27For the linear discriminant analysis, we have used © = (51 + ¥2) /2. 
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boundary is minor, but this is not the case for the QDA decision boundary. Indeed, the 
convexity is stronger because X can take more larger values than X,. This is why for the 
extreme values, the QDA decision boundary can be approximated by a vertical line when 
zı — —oo and an horizontal line when x2 —> +00. Let us now introduce a correlation p be- 
tween X; and X2. It follows that the QDA decision boundary becomes more and more linear 
when we increase p (bottom /left panel). Finally, the impact of the probabilities (7,72) is 
crucial as shown in the bottom/right panel. It is obvious that the boundary decision moves 
to the right when 7, increases, because the decision region concerning i € Cı must be larger. 
For instance, we must always accept 7 € Cı at the limit case mı = 100%. 


p = 50% nm, = 957 


FIGURE 15.16: Impact of the parameters on LDA/QDA boundary decisions 


The general case We can generalize the previous analysis to J classes. In this case, the 
Bayes formula gives: 
; . Pr {i € C;} 
Pr{iec;|X=a} = Pr{x=c2li Ci} Bix aa} 


= c fy (@) +75 
where c = 1/Pr{X = zx} is a normalization constant that does not depend on j. We note 
S; (a) =InPr {i € C; | X = x} the discriminant score function for the jt” class. We have: 


Sj (x) =Inc+l1n f; (x) +1n7; 


If we again assume that X |i € Cj ~ N (uj, 45), we obtain: 


1 1 = 
Sj (x) = Ine+lInz; 5 ny 5 (a uj) Dz (@— py) 


1 1 — 
«x Ina; 5 In [5 5 (x rou E7 (x — py) (15.10) 
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K 
where In c’ = Inc— — ln 27. Given an input x, we calculate the scores S} (x) for j =1,...,J 
and we choose the label j* with the highest score value. As in the two-class case, we can 
assume an homoscedastic model (©; = X), implying that the discriminant score function 
becomes: 


1 
S;(2) = Ine’+Inz;—- 5 (x — py)" =F (x — uj) 
1 
x lnr; + pj Ete — zj E Hj (15.11) 


1 1 
where Ine” = Inc’ — = ln |£| — =x! £71. Equation (15.11) defines the LDA score function, 
whereas Equation (15.10) defines the QDA score function. 
Remark 185 In practice, the parameters Tj, uj and Xi; are unknown. We replace them 


by the corresponding estimates ĉj, ft; and = For the linear discriminant analysis, È is 
estimated by pooling all the classes. 


Example 169 We consider the classification problem of 33 observations with two explana- 
tory variables X, and X2, and three classes C1, C2 and C3. The data are reported in Table 
18.12. 


TABLE 15.12: Data of the classification problem 


a C; Xi Xə i i C; XxX, Xo a Cj Xı Xə 

1 1 1.03 2.85'12 2 3.70  5.08!23 3 3.55 0.58 
2 1 0.20 3.30,13 2 2.81 1.99; 24 3 3.86 1.83 
3 1 169 3.73'14 2 366 261'25 3 5.39 0.47 
4 1 098 3.52,15 2 563 419,26 3 3.15 —0.18 
5 1 0.98 5.15116 2 335 3.6427 3 4.93 1.91 
6 1 347 656,17 2 297 3.55,28 3 3.87 2.61 
7 1 394 468'18 2 316 2.92129 3 4.09 1.43 
8 1 155 599,19 3 300 098,30 3 3.80 2.11 
9 1 1.15 3.60'20 3 309 199'31 3 2.79 2.10 
10 2 1.20 2.27, 21 3 545 0.60 ,32 3 4.49 2.71 
11 2 3.66 5.49122 3 3.59 —0.46!33 3 3.51 1.82 


The first step is to estimate the parameters 7;, pj and Xj, whose values?’ are reported in 
Table 15.13. The second step consists in calculating the score function S; (x) for each class 
j using Equations (15.10) and (15.11). Results are given in Table 15.14. Besides the QDA 
and LDA methods, we have also considered a third approach LDA?, which corresponds 
to a linear discriminant analysis by including the squared values of variables. This means 
that the explanatory variables are X1, X2, X? and X2 in LDA?. By including polynomials, 
the LDA? method is more convex than the original LDA method, and can be seen as an 
approximation of the QDA method. 

If we consider the first observation, the maximum score is reached for the first class 
(—2.28 for QDA, 0.21 for LDA and 6.93 for LDA?). If we consider the 14" observation, 


28For the LDA method, we have: 


s 


E 1.91355 —0.71720 
~ A —0.71720 3.01577 
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TABLE 15.13: Parameter estimation of the discriminant analysis 
Class Ci Co C3 
tj 0.273 0.273 0.455 ; 
A; | 1666 4.376 | 3.349 3.527) 3.904 1.367 
a 1.525 0.929 ı 1.326 0.752, 0.694 —0.031 
j 0.929 1.663 ' 0.752 1.484 | —0.031 0.960 


QDA and LDA predict the third class, whereas LDA? predicts the second class, which is 
the true value. In Figure 15.17, we have reported the class assignment performed by the 
three approaches, and we have indicated the bad class predictions by a circle. In order 
to understand these results, we have also calculated the decision regions in Figure 15.18. 
According to QDA, the decision boundary is almost linear between Cı and C2, whereas it 
is quadratic between C2 and C3. LDA produces linear decision boundaries, but the decision 
surface for Cı has changed. Finally, LDA? can produce complex decision surfaces, even more 
complex than those produced by QDA. 
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FIGURE 15.17: Comparing QDA, LDA and LDA? predictions 


Class separation maximization In the following, we show that the linear discriminant 
analysis is equivalent to maximize class separability and is also related to the principal 
component analysis. We note z; = (%1,...,2i,«K) the K x 1 vector of exogenous variables 
X for the i‘ observation. The mean vector and the variance (or scatter) matrix of Class C; 


is equal to: 
7 1 
y= Dm 
i€C; 
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TABLE 15.14: Computation of the discriminant scores S; (x) 


: QDA LDA LDA? 

Si (x) So (x) S3 (x) : Si (x) So (x) S3 (x) , Si (a) So (x) S3 (x) 
1| —2.28 —3.69 —7.49 0.21 —0.96 —0.79 6.93 5.60 5.76 
2 | —2.28 —6.36 —12.10 0.26 2.17 2.34 i 1.38 —2.13 —1.89 
3| —1.76 —3.13 —6.79 2.84 2.16 1.71 ; 12.13 12.01 11.38 
4| —1.80 —4.43 —8.88 1.35 0.09 —0.22 7.73 6.20 5.93 
5| —2.36 —7.75 —13.70 4.32 2.93 1.45 8.12 5.54 4.76 
6| —3.16 —5.63 —14.68 ! 10.75 11.36 8.95! 14.82 13.99 12.96 
7| —3.79 —1.92 —6.32, 8.06 9.22 8.15, 17.36 19.03 17.89 
8| —2.85  —8.43 —15.23! 6.73 5.76 3.70 | 10.47 8.09 7.15 
9) —1.74 —4.12 —8.37 1.76 0.64 0.27 8.94 7.77 7.39 
10 | —3.14 —3.21 6.17 | —0.58 1.56 0.98 | 6.59 5.59 6.15 
11 | —2.87 —3.01 —9.45 9.10 9.96 8.31 , 16.89 17.65 16.42 
12 | —3.04 —2.38 —7.77 ı 8.42 9.34 7.98 ı 17.28 18.50 17.28 
13 | —6.32 —2.29 —1.62 1.41 1.82 2.64, 12.48 13.94 14.46 
14 | —6.91 —2.07 —1.42ı 3.86 4.94 5.341 15.15 17.41 17.34 
15| —9.79 —3.62 —7.12 9.79 12.43 11.75, 12.58 14.01 13.50 
16 | —3.90 —1.47 —3.441 5.25 5.99 5.65 ı 16.84 18.82 18.03 
17| —3.31 —1.55 —3.61 4.50 4.92 4.63 16.25 17.95 17.21 
18 | —4.84 —1.60 —2.19, 3.65 4.28 4.451 15.51 17.48 17.14 
19 | —10.21 —4.12 —1.27  —0.13 0.52 2.06 , 8.98 9.99 11.70 
20 | —7.05 —2.41 —1.24 1.85 2.50 3.32 ı 12.99 14.72 15.22 
21 | —23.11 —11.16 —2.56 | 2.98 5.75 7.61 3.79 4.57 7.26 
22 | —19.22  —9.53 2.42 1.84 0.57 2.01 1.81 1.53 5.51 
23 | —13.86 —5.92 —1.01 | —0.01 1.15 2.98 | 7.65 8.67 10.95 
24 | —10.01  —3.43  —0.70 2.75 4.07 5.02 ı 12.84 14.95 15.65 
25 | —23.48 —11.44 —2.54 l 2.65 5.38 7.33 l 3.40 4.09 6.95 
26 | —15.87 —7.59 2.30 2.01 1.14 1.23 3.19 3.02 6.50 
27 | —14.09 —5.40 —1.52 4.56 6.78 7.70 11.17 13.24 14.08 
28 | —7.55 —2.27 —1.39 4.18 5.45 5.85, 15.10 17.44 17.40 
29 | —12.40  —4.67 —0.61 2.38 3.92 5.17 11.21 13.14 14.33 
30 | —8.85 —2.87 —0.88 3.17 4.41 5.17, 13.77 15.97 16.37 
31 | —5.97 —2.17 —1.72! 1.58 1.97 2.70 ! 12.78 14.26 14.67 
32| —9.40 —2.97 —1.81, 5.33 7.11 7.46, 14.55 16.95 16.93 
33 | —8.84 —3.01 —0.80! 2.19 3.21 4.16! 12.82 14.77 15.45 

and?9: 


S; = n¥y = > (ai — Bj) (ws — fy)" 


iECj 
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where n; is the number of observations in the j* class. If consider the total population, we 


also have: 


29The variance matrix is equal to the unscaled covariance matrix and is also called the scatter matrix. 
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QDA LDA 


an 


X2 
=- O = NUA HED wo 


- OFNWA U 


X2 
=- O = NWA DN 


and: 


We notice that: 


We define the between-class variance matrix as: 
J 
z PE sT 
Ss = X nj (fj — A) (Aj — A) 
j=1 
and the within-class variance matrix as: 


J 
Sw = 55S; 
j=l 


We can show that the total variance matrix can be decomposed into the sum of the within- 
class and between-class variance matrices”: 


S = Sw + Sg 


The discriminant analysis defined by Fisher (1936) consists in finding the discrimi- 
nant linear combination 8! X that has the maximum between-class variance relative to 
the within-class variance: 6* = arg max J (8) where J (8) is the Fisher criterion: 


_ B'SBB 


30See Exercise 15.4.5 on page 1024. 
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Since the objective function is invariant if we rescale the vector 6 — J (6) = J (8) if 8’ = cB, 
we can impose that 3'Sy@ = 1. It follows that: 


Ê = argmax$'SpB (15.12) 
st. B'SwB=1 


The Lagrange function is: 
£ (8; A) = 8" SBb — à (8'Sw8 -— 1) 
We deduce that the first-order condition is equal to: 


OL (P; A 
CEEA = 288 2ASw 8 =0 (15.13) 


It is remarkable that we obtain a generalized eigenvalue problem?! Sp 8 = \Sw £8 or equiv- 
alently: 

Sp Ssl = A8 (15.14) 
Even if Sw and Sp are two symmetric matrices, it is not necessarily the case for the product 
S Sg. Using the eigendecomposition Sg = VAV! , we have si/ ? = VA1/2V7. With the 
parametrization a = si/ * 8, Equation (15.14) becomes: 


SSS a = da (15.15) 


because 3 = F a. Equation (15.15) defines a right regular eigenvalue problem. Let Ax 
and vu, be the kt? eigenvalue and eigenvector of the symmetric matrix sssi. It is 
obvious that the optimal solution a* is the first eigenvector vı corresponding to the largest 
eigenvalue 1. We conclude that the estimator is B = SoY A and the discriminant linear 
relationship is Y€ = v] S3"? X. Moreover, we have??: 


In Exercise 15.4.5 on page 1024, we show that the Fisher discriminant analysis is equiva- 
lent to the linear discriminant analysis in the case of two classes. This result can be extended 
to multiple classes and explains why this approach is also called Fisher linear discriminant 
analysis. 


Example 170 We consider a problem with two classes Cı and C2, and two explanatory 
variables (X1, X2). Class Cı is composed of 7 observations: (1,2), (1,4), (3,6), (3,3), (4, 2), 
(5,6), (5,5), whereas class Cz is composed of 6 observations: (1,0), (2,1), (4,1), (3,2), (6,4) 
and (6,5). 


In Figure 15.19, we have reported these 13 observations in the plane (x1, 22). The com- 
putation of the first generalized eigenvector gives 8 = (0.7547, —0.9361). We deduce that 
the slope of the optimal line direction is 6,/82 = —0.8062. Computing the Fisher score 
si = B' a; for the it observation is then equivalent to perform the orthogonal projection of 


31See Appendix A.1.1.2 on page 1034 for the definition of the generalized eigendecomposition. 
32Thanks to Equation (15.13), we have Sg8 = ASw6 and BT Spg =AB' Sw. 
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the points on this optimal line (Bishop, 2006). Concerning the assignment decision, we can 
consider the midpoint rule: 

S< wore Cy 

8, > wS>te Ca 


where ji = (mı + fiz) /2, fy = B' ĝi and fig = B' fig. However, this rule is not always 
optimal because it does not depend on the variance 37 and 33 of each class. In Figure 15.20, 
we have reported the Gaussian density of the scores for the two classes. Since we observe 
that the first class has a larger variance, the previous rule is not adapted. This is why we 
can use the tools presented in Section 15.3 in order to calibrate the optimal decision rule. 


7 


N 
x 


FIGURE 15.19: Linear projection and the Fisher solution 


Remark 186 vS X is called the first canonical or discriminant variable (Hastie et 
al., 2009) and we denote it by Yí) The previous analysis can be used to find the second 


canonical variable YÈ = Boy X that is not correlated to Yay such that J (bæ) is maxi- 
mum. The solution is Be) = S3 v where v2 is the eigenvector associated to the second 
largest eigenvalue 2. This method can be extended to the general problem of finding the 
kt? canonical variable Yi) = Biy X that is not correlated to (Yi wah Yia) such that 


J (B) is marimum. Again, we can show that the solution is Bar) Z S3 vk where vz is 
the eigenvector associated to the k*® largest eigenvalue Ay. The computation of the K linear 
relationships Yj) = Bly X is called the multiple discriminant analysis (MDA). MDA can 
be seen as a generalized PCA method by taking into account a categorical response vari- 
able. Indeed, PCA performs an eigendecomposition of S (or È) whereas MDA performs an 
eigendecomposition of Sp Sp- 
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Class Cy \ 
* Class Co 


FIGURE 15.20: Class separation and the cut-off criterion 


15.2.2.2 Binary choice models 


The underlying idea of such models is to estimate the probability of a binary response 
based on several explanatory variables. They have been developed in several fields of research 
(biology, epidemiology, economy, etc.). In statistics, the two seminal papers are again written 
by Fisher (1935) and Cox (1958). Since these publications, these models have been extended 
and now represent a major field of study in statistics and econometrics®®. 


General framework In this section, we assume that Y can take two values 0 and 1. We 
consider models that link the outcome to a set of factors X: 


Pr{Y =1.|4 =o}=F @" 2) 
By construction, F must be a cumulative distribution function in order to ensure that 


F (z) € [0,1]. We also assume that the model is symmetric, implying that F (z)+F (—z) = 1. 
Given a sample {(2;, yi), i = 1,..., n}, the log-likelihood function is equal to: 


£(0) =) nPr{Y; = yi} 


where y; takes the values 0 or 1. We have: 
Pr {¥; = ys} =p - (1 — p) 


33The materials presented below is based on surveys by Amemiya (1981, 1985) and McFadden (1984). 
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where p; = Pr{Y; = 1 | X; = xi}. We deduce that: 


£ (0) 


X yi np: +(1—y;) In (1 — pi) 
j=l 


= 5 yilnF (x; 8) + (1 — y:)ln (1 — F (2; 8)) 
i=1 
We notice that the vector 0 includes only the parameters 3. By noting f (z) the probability 
density function, it follows that the associated score vector and Hessian matrix of the log- 
likelihood function are: 


sia) = P 

_y f (xj B) f (a76) 

Oof ace ow Fra) 
and: 

ore 
ne = Sanur 
=a — DOM (aia; ) 
where: 
H; = f (z] 8) Gece 


F (2/8) F (—a] 8) 


( f (27 8) PeTo) (1 - 2F (a7 2) 
F(z] 8)F(-2)8) F (a76) F (-2; 6) 


Once B is estimated by the method of maximum likelihood, we can calculated the predicted 
probability for the it? observation: 


pi =F (78) 
Like a linear regression model, we can define the residual as the difference between the 
observation y; and the predicted value p;. We can also exploit the property that the condi- 
tional distribution of Y; is a Bernoulli distribution 6 (p;). This is why it is better to use the 
standardized (or Pearson) residuals: 
z Yi — Pi 
fa 


O Val- p) 


These residuals are related to the Pearson’s chi-squared statistic: 


3 


2 _ A 
XPearson 7 u 
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This statistic may used to measure the goodness-of-fit of the model. Under the assumption 
Ho that there is no lack-of-fit, we have x2.arcon ~ X2_« Where K is the number of exogenous 
variables. Another goodness-of-fit statistic is the likelihood ratio. For the ‘saturated’ model, 
the estimated probability p; is exactly equal to y;. We deduce that the likelihood ratio is 
equal to: 


—2inA = 2S > yi ny: +(1-—yi)ln (1 — yi) — 
i=1 
Do Inf; + (1 — ys) In (1 — pi) 
i=1 
= 35 in (4) +(1—y;)In ( = 
i=1 4 
In binomial choice models, D = —2In A is also called the deviance and we have D ~ x? _ g. 


Ina perfect fit 6; = y;, the likelihood ratio is exactly equal to zero. The forecasting procedure 
consists of estimating the probability 6 = F (en Ê) for a given set of variables x and to use 
the following decision criterion: 

Y=1<¢ p> 


Remark 187 It could also be interesting to compute the marginal effects. We have: 


[Y | X =2]=F (278) 


and: 


o U = f (27) ô 
x 

The marginal effects depend on the vector x and are then not easy to understand. This 

is why we generally compute them by using the mean of the regressors or averaging them 

across all the observations of the sample. 


Logit analysis The logit model uses the following cumulative distribution function: 


1 e? 


Ite +1 


The probability density function is then equal to: 


=g 


e 


z) = ———_, 
i= 
We verify the property F (z) + F (—z) = 1. The log-likelihood function is equal to: 


n 


£(8) = S°(-y)m(1—F (x/ B)) + yinF (z; 8) 


i=l 


n : e- ti b -gtp 
= Sa-wn (a, n (ree) 


i=l 


= 2 yn (1 + eae) + (1—y:) (x1 8) 
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We also have*?: 


and?5: 


Probit analysis The probit model assumes that F (z) is the Gaussian distribution. The 
log-likelihood function is then: 


n 


£(8)= 5° (1-4) In (1-2 (x7 B)) + yind (2/ 8) 


i=1 


The probit model can be seen as a latent variable model. Let us consider the linear model 
Y* = B'X+U where U ~ N (0,07). We assume that we do not observe Y* but Y = g (Y*). 
For example, if g (z) = 1{z > 0}, we obtain: 


Pr{Y =1| X =a} Pr{g'X+U>0|X =a} 
z o (25) 
(on 


We notice that only the ratio 6/o is identifiable. Since we can set ø = 1, we obtain the 
probit model. 


Regularization Let (0) be the log-likelihood function. The regularized log-likelihood 
function is equal to: 


£(0; 4) = £(6) - > | 


The case p = 1 is equivalent to consider a lasso penalization, whereas p = 2 corresponds to 
the ridge regularization. The optimal value 0* is obtained by maximizing the regularized 
log-likelihood function: 

6” (A) = arg max £ (6; A) 


In this problem, we consider À as an hyperparameter, meaning that A is not directly es- 
timated by maximizing the penalized log-likelihood function with respect to (6; A). For 
instance, in the case of the lasso regularization, À can be calibrated in order to obtain a 
sparse model or using cross-validation techniques. 


34We use the property f (z) = F (z) (1 — F(z)), implying that: 
f(z) f(z) 


F(@)FC2) FEU- FE) 


35We use the property f’ (z) = —f (2) F(z) (1 — e77), implying that: 


F) Jie 
FE (1 — 2F (z)) =0 
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Extension to multinomial logistic regression We assume that Y can take J labels 
(£1,...,£7) or belongs to J disjoint classes (C,,...,C 7). We define the conditional proba- 
bility as follows: 


pj(z) = Pr{Y=£;|xX =x} 
= Pr{Y €C,; |X =a} 

pie 

e”i 


lees 


for j =1,...,J — 1. The probability of the last label is then equal to: 


p(t) = 1- p(z) 


ESS ere 


We verify that 0 < p; (x) < 1 for all j = 1,..., J. The log-likelihood function becomes: 


n J 
(0) = 5 ln II Pj (xi 
i=1 j=1 


where @ is the vector of parameters (81,...,8J—1). 
The multinomial logistic model can be formulated as a log-linear model. We note: 


In p; (£) = Bo + 8j x 


Since we have Dia p; (x) = 1, we deduce that the constant 6o is given by: 


J 

T 1 
a a =1 S bo = eer 
j=1 jit? 


It follows that: 
Bla 
evi 


p; (x) = ——__— 
J ( ) 2 ef; © 
This function is known as the softmax function and plays an important role in neural 
networks. We also notice that the model is overidentified because the sum of probabilities 
is equal to 1. However, if we use the parametrization 8; = 6; — BJ, we obtain the previous 
model*®, which is just identified. 


36 Indeed, we have: 


because ey = e(8s-By)'# =1. 
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15.2.3 Non-parametric supervised methods 


We have named this section ‘non-parametric supervised methods’ in order to group some 
approaches that share some of the same characteristics. First, even if some of them are para- 
metric, these models are highly non-linear, meaning that it is extremely difficult to interpret 
these models. In this case, ‘forecasting’ is the main motivation and is more important than 
‘modeling’. Second, it would be illusory to consider or to do statistical inference. Most of 
the time, it is impossible to calculate the variance of the parameters and the associated t- 
statistics. Therefore, the term ‘model calibration’ is more appropriate than the term ‘model 
estimation’. Finally, the number of parameters or unknowns can be large. 

If we consider the linear regression model, we have Y = 8'X + u where (Y,X) 
forms a random vector. If we consider an observation i, we have y; = f(x) + u; where 
f (ai) = os kti. Let us now consider some non-linear features. We can replace the 
linear function by: 


K 
F (ws) = X Bade (ts) = B' ¢ (aa) 
k=1 


For example, we can use quadratic, cubic or piecewise features. We abandon the framework 
of Gaussian conditional distribution, which is the basis of linear regression, and the reference 
to the random variables X and Y is not necessary. This means that the calibrated parameters 


(a. ..., Êx ) are less relevant. Only the calibrated function Î (x) is important. For instance, 


if we use radial basis functions: 
1 2 
ox (2) = exp (=5 lie — cxl 


where cp is the centering parameter, we obtain: 
K 
f (wi) = y Bee“ flea? 
k=1 


Even if f (x) is a parametric function, it can be considered as a non-parametric model. 
Indeed, the functional form is the relevant quantity, not the parameters. 


15.2.3.1 k-nearest neighbor classifier 


The k-NN algorithm is one of the simplest non-parametric models. Let {(x;,y;)} be the 
training sample of dimension n. We assume that the labels y; can be assigned to J classes 
(Ci,...,C 7). The goal is to assign a label y for a given unlabeled observation x. For that, we 
select the k closest labeled observations in the training sample and we find the label ĝ that 
appears most frequently within the k-subset. Said differently, the k-NN classifier uses the 
majority vote of the k closest neighbors and the classification rule depends on k, which is the 
hyperparameter. It is obvious that a high value of k helps to smooth the decision regions, 
but it increases the computational complexity. Moreover, there is a trade-off between bias 
and variance. If k = 1, we assign to x the label of the input x; that is the closest. If k = n, 
we assign to x the most frequent label of the training sample. In the first case, we see that 
ĝ is an unbiased estimator of y, but its variance is large. In the second case, the estimator 
is biased but it has a small variance. 

The implementation of the k-NN algorithm requires defining the distance between the 
points x; and xj. Generally, we use the Euclidean distance, but we can consider the 
Minkowski distance. To find the k closest labeled observations, the simplest way is the 
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brute-force approach. When the number of observations n is large, we can use more efficient 
methods based on tree-based partition®”. 


Training set (n; = 100) Test set (nọ = 1000) 


FIGURE 15.21: Illustration of the k-NN classifier 


We consider the non-linearly separable classification problem, where the classes are dis- 

tributed in rings around the point (0,0): 
C; = { (£i 1, 24,2) E R? ; ia < coe + a4 < ri) 

where j = {1,2,3} and r; = j is the radius of the ring. In the first panel in Figure 15.21, we 
have represented the three rings, and we have reported 100 simulated observations (x£;,1, 2;,2) 
that form the training set. In the second panel, we consider 1000 observations. Solutions 
provided by 1-NN and 10-NN classifiers are given in the third and fourth panels. We notice 
that the 10-NN classifier is less efficient than the 1-NN classifier, because the number of 
closest neighbors is large compared to the number of observations in the training set. 


Remark 188 We can apply the k-NN algorithm to the regression. In this case, the predicted 
value ĝ is the (weighted) average of the values yi of the k closest neighbors°®. 


15.2.3.2 Neural networks 


Neural networks as non-linear models We have seen that we can extend the linear 
model as follows?’: 


yi = Bo + B'o (zi) + Ei 


In this case, we transform the input data (#1,...,%:,«) into the auxiliary data 
(Zi,1,---,2%i,«K) Where Zik = Op (Zip). Here, the non-linearity property is introduced thanks 


37The two most famous methods are the K-D tree and ball tree algorithms. 
38The weight is generally inversely proportional to the distance. 
39Here, we include a constant in the model. 
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to the non-linear function x. However, there are many other ways to build a non-linear 
model. For instance, we can assume that: 


yi = $ (Bo + Bl ai) +e 


We first create the auxiliary data z; = Bo + G'a; from the inputs and then apply the 
non-linear function ¢ (x). If we use several non-linear functions, we obtain: 


J 
yi = 91565 (bo + B' ai) +e; 


j=l 
or: 
J 
Yi = Pl Yr So bi (80+ B' zi) | +e 
j=l 
= fxi) +E 


The underlying idea of neural networks is to define a non-linear function f (x), which is 
sufficiently flexible to fit complex relationships. 


O C) Output y 
Input z3 y — 
Input z4 — 
FIGURE 15.22: The perceptron 


Neural networks as a mathematical representation of biological systems The 
term ‘neural network’ makes reference to biological systems, in particular the brain. For 
instance, Rosenblatt (1958) proposed a self-organizing and adaptive model called the per- 
ceptron. It is no coincidence that the title of this publication is “The Perceptron: A Proba- 
bilistic Model for Information Storage and Organization in the Brain”. We have represented 
the perceptron in Figure 15.22. The input data are combined in order to produce an hidden 
variable z = Ti Bkzk. Then, we apply the function f (z) in order to obtain the output y: 


0 ifz<0 
y=1()={ 1 ifz>0 


In the context of neural networks, the function f (z) is called the activation function 
and z is the hidden unit. If we generalize the perceptron by considering different hidden 
units, we obtain the artificial neural network described in Figure 15.23. In this example, 
we have four input units, five hidden units and one output unit. This model is also called 
a feed-forward neural network with one hidden layer. It can be extended in two directions. 
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Input Hidden Output 
layer layer layer 


FIGURE 15.23: Feed-forward neural network with a single hidden layer 


Input Hidden Hidden Output 
layer layer #1 layer #2 layer 


O Output y2 
o- Output y3 
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FIGURE 15.24: Feed-forward neural network with two hidden layers and three output 


units 
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First, we can consider several output units. Second, we can use several hidden layers. In 
this case, we speak about multi-layer neural networks (Figure 15.24). For example, deep 
learning refers to a neural network with a large number of hidden layers. 

The term neural network does not only refer to the structure input layer — hidden layer 
— output layer. Activation functions generally map the resulting values into the range [0, 1] 
or [—1,1] and correspond to sigmoidal functions. For example, the perceptron uses the 
Heaviside step function f (z) = 1{z > 0}, because it indicates if the neuron is activated or 
not. We can also use the sign function f (z) = sign (z) in order to indicate a ‘positive’ or 
‘negative’ potential. However, the most popular activation functions are continuous: 


1. the logistic function is equal to: 


1 e” 


A ET 


(15.16) 
we have f(z) € [0,1], meaning that we can interpret f (z) as a probability function; 
moreover, it is symmetric about 0.5; 

2. the hyperbolic tangent function is defined by: 


e” — e7” e27 =j 
= = 15.17 
Plz) ee +e” e? 41 ( ) 


and we have f (z) € [-1, 1]; 
3. the rectified linear unit (ReLU) function corresponds to: 
f(z) = max (0, z) (15.18) 
and we have f (z) € [0, 00). 


Furthermore, neural networks are also characterized by the concept of learning algorithms. 
Neural networks can be seen as non-linear functions with some unknown parameters. The 
first idea is then to estimate the parameters by minimizing the residual sum of squares, 
meaning that neural networks are just a particulate case of non-linear least squares. How- 
ever, neural networks generally use other techniques for identifying the parameters. Sta- 
tistical learning implicitly refers to human brains or natural neural networks. The concept 
of learning is then central and shall contrast with the concept of optimization. The latter 
implies that there is one solution. In an artificial neural network, each node represents a 
neuron and each connection can be seen as a synapse. Since these connections transmit a 
signal from one neuron to another, the underlying idea is that they learn like in a human 
brain. This is why the parameters that control these connections are updated until the ar- 
tificial neural network has learnt. In fact, the difference between optimization and learning 
is somewhat forced. Indeed, optimization also uses iterative algorithms that can be inter- 
preted as learning algorithms. However, the learning algorithms that are used in artificial 
neural networks try to imitate the learning process of human brains*®. They are also called 
adaptive learning rules in order to say that they are adaptive and they try to learn. 


Remark 189 According to Bishop (2006), the term neural network “has been used very 
broadly to cover a wide range of different models, many of which have been the subject of 
exaggerated claims regarding their biological plausibility”. In fact, we use neural networks 
as non-linear regression models in the sequel. 


40Tt is particularly true for the first generation of algorithms that were discovered before 1990s. 
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Structure of the canonical neural network The notations used in neural networks 
and machine learning are generally different than those used in statistics. In this book, we 
have tried to use similar homogenous notations in order to make the reading easier: 


e the observation (also called the example) is denoted by i; 


e the input variable uses the index k and z; is the k“” input variable of the itè obser- 
vation; 


e zin is the value taken by the ht! hidden variable and the it observation; 


e for the output variables (also called the patterns), we introduce the notation y; (x;) 
to name the model output taken by the jt output variable and the it! observation; 
sometimes, we use the alternative notation ĝi j, which is more traditional in statistical 
inference theory. 


The number of input, hidden and output variables are respectively equal nz, nz and ny. 
The activation functions fr, and fz y links respectively the x’s to the z’s, and the z’s to the 
y’s. In order to distinguish them, fz y is also called the output scaling function. We havet!: 


Zin = Joz (Win) = faz (£ Brama) 


k=1 


and: 
Yj (21) = fey Wij) = fey B taz) 
h=1 


where u; n and vi j are the intermediary variables before the activation of the functions f; 
and fz. Finally, we have: 


Yj (xi) = fey (3: Yih fa,z (3: a) (15.19) 
h=1 


k=1 
Figure 15.25 summarizes the structure and the notations of this neural network. 


Remark 190 Including a constant is equivalent to consider that x; = 1. A variant model 
is to define y; (x;) as follows: 


Nez 


Yj (@i) = fey (us +Y infaz Q +5 Prarie) ) (15.20) 
k=1 


h=1 


In this case, we add a constant as an input variable (8n,9) and a constant as a hidden 
variable (y;,9). Bishop (2006) shows that this model can be written as: 


Yj (xi) = fey 2 Yj,hfe,z (3: a) 
h=0 k=0 


where xio = 1. The other possibility is to have a direct link between the x’s to the y’s or 
skip-layer connections: 


Yj (i) = fey (we +) Ynte (a +> Paar) +5 vances] (15.21) 


h=1 k=1 k=1 


“1 Most of the time, we use the same activation function fr, (u) = fz,y (u) = f (u). 
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Input Hidden Output 
layer layer layer 


in| YI E) = Fay Vis) 


Na Nz Ny 


FIGURE 15.25: Canonical neural network 


Loss function If we note y; j the value of the output variable that is observed*”, we would 
like to verify: 


Yj (xi) = Yi,j 


It follows that a natural loss function is the sum of squared errors: 


= 2 eS ; (yj (21) = vis)” (15.22) 


where @ is the vector of parameters and n is the number of observations. Minimizing this 
loss function is also equivalent to maximize the log-likelihood function associated to the 
non-linear regression model: 
Yi,j = Yj (Bs) + Ei,j 

where Ei j ~N (0,07) and Ei j Ai EG j! if i Æ i’ or j A j’. 

The previous loss function is natural when considering a non-linear regression. In the case 
of binary classification (y; = 0 or y; = 1) and if the output y(a;) represents a probability, 
it is better to use the cross-entropy error loss** 


=A yi ny (a) + (1 — y:) In (1 — y (z:))) (15.23) 


“Tt is called the target value or the pattern. 
43We skip the subscript j because we assume that j = 1. We have then y; = yi,1 and y (ai) = yi (xi). 
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The choice of the loss function depends then on the output variable, but also on the activa- 
tion function. For example, the cross-entropy error loss is adapted if f, , corresponds to the 
logistic function, but not to the hyperbolic tangent function. In the case of the multi-class 
classification problem, Bishop (2006) proposes to consider the following loss function: 


nm Ny 


£(0) =- 5 X yis In yj (xi) (15.24) 


i=1 j=1 


where ny is equal to the number of classes nc and fz y corresponds to the softmax function 
that was previously defined in the case of the multi-logistic model: 


yj (ti) = fey (vig) 


evs 
n Se 
3A 


The loss function is then the opposite of the log-likelihood function. 


Learning rules In order to minimize the loss function, we can use classical optimization 
algorithm** (Newton-Raphson, conjugate gradient, BFGS, DFP, Levenberg-Marquardt, 
etc.). As we have already said previously, this is not the philosophy of neural networks, 
and we generally prefer to use a statistical learning rule, which corresponds to an iterative 
algorithm: 

gt) — 9) 4 Ag® 


where 6“) is the value of 6 at the iteration (or epoch) t, and A@™ is the adjustment vector. 
The learning rule consists in defining how 6 is updated, and is mostly based on the 
gradient of the loss function: 


Here are the main methods (Smith, 1993): 
e The steepest descent method is defined by: 
Ag® = —n-G (0) 
where 7 > 0 is the learning rate parameter. For minimizing the loss function, Ag) 
should go in the opposite direction of the gradient. 


e For the momentum method, we have: 
AG = —(1- am) N: G (6) + am: AOD 
= -mG (0) + am AGO 


where a, € [0,1] is the momentum weight and nm > 0 is the momentum learning 
rate parameter. Therefore, the adjustment at iteration t is the weighted average of 
the adjustment at iteration t— 1 and the steepest descent adjustment. The underlying 
idea of the term a,A@-) is to keep going in the previous direction. This method 


can speed up the algorithm because it may avoid oscillations*’. 


44See Appendix A.1.3 on Page 1046. 
45 A better method is to consider the Nesterov approach: 


AO = -nm + G (0 + amA0O-Y) + am - A0) 
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e The adaptive learning method is given by: 
Ag) = —n).G (0) 


where: 
a S 2 Ytn if G (0). Kk (0) >0 
T =] o-n) otherwise 


k >0,0<¢< land K(0®) = G (0-9). Instead of using a fixed step 7, we 
consider a variable step 7) that depends on the previous value n‘'~)). The variable 
step increases when the gradient does not change between iterations t — 1 and t, and 
decreases otherwise. Another rule is to consider a moving average of the gradient: 


K (0) (L=piog (9) cee 4 (a) 


(1-9) Dota (6) 


where g € [0, 1]. In the case 9 = 0, we retrieve the previous rule K (06) = G (079). 


Q 


e The adaptive learning with momentum method combines the two previous approaches: 
Ag = (1-a) -G (00) +a: Ao 


= n®.G (0) + Om,  AOt-) 


There are numerous other algorithms*® (adagrad, adam, nadam, rprop, rmsprop, etc.), 
and a lot of tricks for accelerating the convergence. First, we distinguish three approaches 
for evaluating the gradient of the objective function: 


1. the batch gradient descent (BGD) computes the gradient with respect to the entire 
training dataset: 


G (0) 2 oa 


2. the stochastic gradient descent (SGD) considers only one different training example 
at each iteration: 


AL; (0) 
O= 2V _ 
o 
where £; (0) is the loss function for the i” observation; 


3. the mini-batch gradient descent (MGD) updates the parameters by using a subset of 
the training dataset: 


CPi 


ics) 


where the subset S“) changes at each iteration. 


The underlying idea is to evaluate the gradient not with respect to the current value 0), but with respect 
to the prediction of the future value 6+). This prediction is equal to 6¢+1) = 6 + a,A0%- in the 
momentum method. 

46See Ruder (2016) for a review of recent approaches. 
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It is obvious that the choice of one approach depends on the size of the training data. 
Moreover, we better understand why the momentum approach is important when defining 
the learning rule. Indeed, in SGD and MGD approaches, the estimation of the gradient 
is more noisy than in the BGD approach. The momentum method helps to smooth the 
gradient and to obtain a more consistent direction. 


We give here some default values that are used for the learning rules: n = 1, &m = 0.75, 
k = 0.1, 6=0.9, am = 0.6 and o = 0.5. However, these parameters can change during the 
learning process. For instance, the learning rate parameter 7 can be greater at the beginning 
of the learning process, because of the large gradients. In a similar way, we can use a small 
momentum parameter a,, and then we can increase it progressively. We can also assume 
that the appropriate learning rules can vary between the parameters. 


Error backpropagation In order to calculate the gradient G (0), we consider a method 
called error backpropagation (or backward propagation). In the case of the loss function 
(15.22), we have £ (8) = 77, £: (0) and: 


It follows that G (0) = $; G; (0) where G; (0) is the gradient of £; (0). In the case of the 


previous loss, we can use the decomposition £; (0) = Da Lij (0) where: 


1 
Lij (0) = 3 (yj (z) — yas)? 
Using chain rule, we obtain*’: 
0 Lij (8) — OLR; (8) OY; (xi) . Ovi j 
OY;,h Oyj (zi) OUg O44 


= (y; i) — Veg) Fy (Vij) zih 


and Oy, n Lij (0) = 0 when j # j’. We also deduce that*®: 


0 Lij (8) — O Lij (8) ; O zih ; O Uin 
Ə Bnk Ozin Ouin OBnk 
ð Li; (8) 


= Ban i (uin) Lik 


= (yz (ti) — Yis) Fay (Vig) Vin te,e (Uih) Lie 
In the case of Model (15.20), there is a constant and we have: 


ƏLi; (0 
Fn = (yy (2) — 465) fay (Oe) 


47The distinction between j and j’ is important when we consider the softmax function (see Exercise 
15.4.7 on page 1025). 
48 Because we have: 


OLi5 (0) _ ƏLijlð) Əyj (ti) Əvij 
3 zin OY; (xi) ð vi j O zin 
(yj (ws) — yig) Fay (Vij) Yih 


984 Handbook of Financial Risk Management 


and: Ə Li; (8) 
Pa = (yy (21) = Vis) Fey Vij) Vafo (Uin) 


In the case of Model (15.21), we have for the direct links: 


a Li; (0) 
ao. = (Yj (xi) — Vij) Fiy (vij) Ti,k 
jnz 
It follows that the neural network consists in two steps. The forward propagation computes 
Uih, Zi,hy Vij and y;(x;), meaning that the information comes from left to right. The 
backward propagation computes all the derivatives using the chain rule, implying that the 
information goes from right to left. 


Remark 191 All the previous quantities can be calculated in a matrix form in order to 
avoid loop implementation (see Exercise 15.4.7 on page 1025). 

Since f, „ and fi , are easy to calculate, all the derivatives are calculated in a closed-form 
expression. For instance, the derivative of the logistic activation function is equal to: 
-z 


I) S eae 


= re (1 1 =) 
= f(z) (1- f 2) 


It follows that f; y (vig) = fey (viz) (L — fey (v4.3) = yz (vi) (1 — yj (x:)) and fr, (tin) = 
Zi h (1 — Zin). In Exercise 15.4.7 on page 1025, we consider other activation functions and 
loss functions. 


Examples Neural networks are sufficiently flexible that they can approximate any con- 
tinuous function. Therefore, they are said to be ‘universal approximators’ (Bishop, 2006). 
Figure 15.26 illustrates this property when the function is f (x) = 2.cos (x) or f (x) = |a|—2. 
For that, we use the network structure (15.20) with two constants and direct links. The ac- 
tivation function f;,, is the hyperbolic tangent function, while the output scaling function 
fz is the identity function. The training step is done with 201 uniform points between 
—4 and +4. We notice that the accuracy depends on the number n, of hidden nodes. In 
particular, the approximation is very good when we consider three hidden nodes. The uni- 
versal approximation property is certainly the main strength of neural networks. It suffices 
to increase the number of hidden nodes in order to achieve a given accuracy. However, 
this property is also the main weakness of neural networks. Indeed, the distinction between 
training and validation steps is not obvious, and overfitting risk is large. 


The trade-off between n, and £ (0) is not the only issue with neural networks. Another 
problem is the scaling of data. By applying activation functions, the output domain is not 
necessarily the set R””». In Figure 15.27, we have reported the approximation of f (x) = |a|— 
2 by considering two hidden nodes and different configurations. The first panel corresponds 
to the network structure (15.19) without constant and direct link (8o = 0, yo = 0 and 
Yx = 0). In the second panel, we include the two constants 69 and yo, but not the direct 
links (Ys = 0). We notice that this second structure is better to approximate the function 
than the structure of the first panel. The reason is the range of dom f (x), which is better 
managed by including a constant yo. This is confirmed by the third panel. Finally, the fourth 
panel assumes that the output scaling function fz y is the logistic sigmoid function. In this 


Credit Scoring Models 985 


f(x) = 2cos(x) f(x) = 2cos(x) 
n, = 1 n= 3 


f(x) = Ixl - 2 f(x) = Ixl - 2 
n, = 1 n= 3 


Aaea e -1 0 7 2 3 4 °-4-3 -2 -1 0 1 2 3 4 


FIGURE 15.26: Neural networks as universal approximators 


Bo = 0, Yo = O and y, = O 


“4-5 -2 -1 0 1 2 8 4 


Logistic scaling 


ah r 2 sl 0 1 2 3 4 Shogo) ot 0 1 2 3 4 


FIGURE 15.27: The scaling issue of neural networks (f (x) = |x| — 2) 
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case, it is obvious that the output y (£j) € [0,1] cannot reach dom f (x) = [—2, 2]. This case 
is trivial while the three previous cases are not. This means that the network structure is 
crucial, not only the number of hidden units, but also the choice of activation and scaling 
functions, adding or not constants and direct links, and the scaling of both input and output 
data. 


Least squares error 


“o 1000 2000 3000 4000 5000 
Iteration 


Cross—entropy error 


Iteration 


FIGURE 15.28: Convergence of the XOR problem 


The XOR (or exclusive or) problem is a classic problem in neural networks. We note 
y = £1 Ð z2 where xı and £2 are two binary outputs: 


080=0 
091= 

1@0=1 
1@1=0 


The XOR problem can be viewed as a supervised classification problem. In order to solve 
this problem, we use a neural network with three hidden nodes with no constant and no 
direct link. The activation and output scaling functions are set to the logistic function. In 
Figure 15.28, we have represented the evolution of the loss function £ (0) with respect to 
the iterations of the learning rules, which are steepest descent (SD), momentum (MOM), 
adaptive learning (AL) and adaptive learning with momentum (AL II) methods. We also 
consider a steepest descent with optimal stepsize (SD II) and the BFGS algorithm. More- 
over, we have used the two loss criteria: least squares and cross-entropy errors. The results 
show the following major lessons. First, we notice that the convergence highly depends on 
the learning rule, but also on the loss criterion. Second, a comparison of the optimal pa- 
rameters 6 shows that they are all different. They differ from one learning rule to another, 
but they also differ from one loss criterion to another even if we use the same learning 
rule. This result is not surprising, because we observe that the solution 6 changes each time 
we consider new starting values. This means that neural networks produce models that 
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are overidentified. In this context, it is perfectly illusory to analyze and understand the 
estimated model. As we have already said, only the predictions ĝ; į are relevant. 


TABLE 15.15: Data of program effectiveness 


OBS | GPA TUCE PSI GRD]|OBS,GPA TUCE PSI GRD 
1 1266 20 0 0 17 2.75 2 0 0 
9.289: 22 0 0 18 , 283 19 0 0 
3 1328 24 0 0 19 ' 312 23 1 0 
4,292 12 0 0 20 , 3.16 25 1 1 
5 1400 21 0 1 21 1 2.06 22 1 0 
6 |286 17 0 0 22 | 3.62 28 1 1 
7T 1276 17 0 0 23 1289 14 1 0 
8 287 2l 0 0 24 ' 3.51 26 1 0 
9 13.03 25 0 0 25 13.54 24 1 1 
10 ' 3.92 29 0 1 26 ! 2.83 7 1 1 
11 1263 20 0 0 27 889 17 1 1 
12 ' 3.32 23 0 0 28 ! 267 24 1 0 
13 13.57 23 0 0 29,365 21 1 1 
14 ! 3.26 25 0 1 30 ! 4.00 23 1 1 
15 | 3.53 26 0 0 31 8.10 2l 1 0 
16 2.74 19 0 0 32 12.39 19 1 1 


Source: Greene (2017), Table F14.1 and Spector and Mazzeo (1980). 


We consider the classification problem described in Greene (2017) based on the study of 
Spector and Mazzeo (1980), who examined whether a new method of teaching economics, 
the personalized system of instruction (PSI), significantly influenced performance in later 
economics courses. The corresponding data are reproduced in Table 15.15. OBS is the 
observation, that is the student. The output variable is GRD, which corresponds to the 
grade increase (1) or decrease (0) indicator for the student. The explanatory variables are 
the constant C, the grade point average GPA, the test score on economics test TUCE, 
and the binary variable PSI that indicates the participation to the new teaching method. 
Following Greene (2017), we estimate the following logit model: 


II 


where F is the cumulative distribution function of the logistic distribution. The results 
are reported in Table 15.16, and the value of the optimized log-likelihood function is 
£ (3) = —12.8896. In order to challenge the logistic regression, we consider a neural network 


with three hidden nodes. The logistic function is used for both the activation and output 
scaling functions and we consider a direct link between the input variables (C, GPA, TUCE 
and PSI) and the output variable GRD. In Table 15.17, we have calculated the estimated 
probability 6; = Pr {GRD; = 1} in the cases of the logit model: 


pi =F (xy oes 
and the neural network: 


fi =F (90> RF (BPm) + 42" a) 
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TABLE 15.16: Results of the logistic regression 


Parameter | Estimate oe t-statistic p-value 
error 

Bo —13.0214 4.9313 —2.6405 0.0134 

By 2.8261 1.2629 2.2377 0.0334 

Bo 0.0952 0.1415 0.6722 0.5069 

Bs 2.3787 1.0646 2.2344 0.0336 


TABLE 15.17: Estimated probability 6; = Pr {GRD; = 1} 


OBS , Logit NN | OBS ; Logit NN 

1 ! 2.658 2.658 17 | 5.363 5.363 
5.950 5.950 18 3.859 3.859 
18.726 18.726 19 58.987 58.987 
2.590 2.590 20 66.079 66.079 
56.989 56.989 21 6.138 6.138 
3.486 3.486 90.485 90.485 
2.650 2.650 23 24.177 24.177 
5.156 5.156 24 85.209 85.209 
9 11.113 11.113 25 83.829 83.829 
10 69.351 69.351 26 48.113 48.113 
2.447 2.447 27 63.542 63.542 
19.000 19.000 28 30.722 30.722 
13 32.224 32.224 29 84.170 84.170 
14 19.321 19.321 30 94.5384 94.534 
15 36.099 36.099 31 52.912 52.912 
16 3.018 3.018 32 11.103 11.103 


N 
N 


o NOANA Wh 


m= 
— 


jà 
N 


The results are surprising. The estimated probability calculated with the neural network is 
exactly equal to the estimated probability calculated with the logit model. If we inspect the 
estimated coefficient, we obtain: 


i 1.0343 0.8482 1.0678 0.5770 
BOM) = | 0.3856 0.1976 1.4420 0.8744 
0.1925 0.8791 2.0427 0.5439 


ain) = (—2.9240, —2.9538, —3.6783) and ann) = (—3.4652, 2.8261, 0.0952, 2.3787). More- 
over, the loss error is equal to £ (ô) = 12.8896, which is exactly the opposite of the op- 


timized log-likelihood function. This result is not surprising because the neural network 
encompasses the logit model: 


Pr {GRD; = 1} = F | 9R (60%) + 9% a, 
“H{SS" 


specific nn effect logit-effest 


We also notice that the logit coefficients are the same than the neural network coefficients for 
the direct link units (3(°8i) = 4{™) with the exception of the constant’. Let us estimate 


49The constant is equal to —13.0214 for the logit model and —3.4652 for the neural network. 
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the neural network by using other starting values for the optimization step. We obtain the 
same probability than previously, but the estimated coefficients are not the same. We have: 


R 0.4230 0.9108 0.5875 0.0882 
B= | 0.9586 0.2078 0.7862 0.4852 
0.7835 2.9180 8.7259 0.4899 


an) = (—4.5296, —4.3299, —4.2120) and apn) = (0.0501, 2.8261, 0.0952, 2.3787). Again, the 
neural network coefficients for the direct link units are equal to the logit coefficients with 
the exception of the constant. We deduce that the neural network does not differ from the 
logit model, because we have: 


A (logi nn nn nn 
ASEO = APE (APPa) tat” 


This result is interesting, because it shows that the neural network did not better than the 
logit model, although it presents more flexibility. 


Remark 192 The previous results are explained because we optimize the cross-entropy er- 
ror loss for estimating the parameters of the neural network. This implies that the logit 
framework is perfectly compatible with the neural network framework. 


15.2.3.3 Support vector machines 


The overidentification of neural networks is an important issue and the optimization step 
involves an objective function, which is generally not convex with respect to the parameters. 
This implies that there are many local minima. Moreover, the foundation of neural networks 
suffers from little theoretical basis of these learning models. Like neural networks, support 
vector machines (SVM) can be seen as an extension of the perceptron. However, it presents 
nice theoretical properties and a strong geometrical framework. Once SVMs have been first 
developed for linear classification, they have been extended for non-linear classification and 
regression. 


TABLE 15.18: An example of linearly separable observations 


i 1 2 3 4 5 6 7 
zia 05 2.7 27 1.7 15 23 40 
tio 25 42 20 42 0.7 53 69 
y HO H O H H o H H HH 

i 8 9 0 ll 2 B 4 15 

v1 64 7.7 88 74 65 83 6.0 5.0 

tio 45 2.2 60 65 1.7 13 13 05 

yi -1 -1 -1 -1 -1 -1 -1 -1 


Separating hyperplanes We consider a training set {(x;, yi), i = 1,..., n}, where the 
response variable y; can take the values —1 and +1. This training set is said linearly sepa- 
rable if there is a hyperplane H = {x € RË : f (x) = Bo + x! 8 = 0} such that: 

yi = sign f (x) 
This means that the hyperplane divides the affine space in two half-spaces5° such that 


{i: yi = +1} € H™ and {i: y; = —1} € H~. Let us consider the example with two ex- 
planatory variables given in Table 15.18. We have represented the data (x; 1, %;,2) and the 


50The upper half-space H+ is defined by f (x) > 0 while the lower half-space H~ corresponds to f (x) < 0. 
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corresponding label y; in Figure 15.29. It is obvious that this training set is linearly sep- 
arable. For example, we have reported three hyperplanes Hı, H2 and H3 that perform a 
perfect classification. 


T2 Hy THs / Ho 


FIGURE 15.29: Separating hyperplane picking 


Since there are many solutions, we may wonder if there exists one solution that dominates 
the others. The answer has been proposed by Vladimir Vapnik and Alexey Chervonenkis 
in the sixties, who have formulated the concept of support vector machines. Following 
Cortes and Vapnik (1995), the optimal hyperplane is the one that maximizes the margin. 
In Figure 15.30, we have represented an hyperplane and the two margins M+ and M-, 
which corresponds to the Euclidean distance between the hyperplane and the closest positive 
and negative points. The underlying idea of Vapnik and Chervonenkis is then to find the 
hyperplane H with the largest values of M+ and M—. 


We notice that finding a hyperplane with two different margins M} # M_ is equivalent 
to define a hyperplane with the same positive and negative margins: M} = M_ = M. This 
implies that the two separating hyperplanes H} and H_ are equidistant to the hyperplane 
H. The estimation of H} and H_ requires identifying the training points that belongs to 
H and H_. These points are called the support vectors. In the case of Figure 15.30, two 
support vectors are necessary to define H} and H_, or equivalently H and the margin M. 
By construction, the number of support points is at least equal to the number of explanatory 
variables. Except in degenerate cases, there are much less number of support points than the 
number of observations. This implies that not all observations are relevant for defining the 
decision boundary of an optimal linear classifier. Only the support vectors are important. 


Hard margin classification The maximization problem is: 


TIJ = argmax M 


St 4 fle) <-M if y=-1 
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FIGURE 15.30: Margins of separation 


However, this optimization problem is not well defined, since M depends on 8. More pre- 
cisely, it is inversely proportional to ||||,. This is why we need to add another constraint, 
e.g. bı = 1 or ||8||, = 1. Another approach is to standardize the problem by setting M = 1. 


Let x— and x; be two (negative and positive) support vectors, we deduce that the 
distance between x_ and x is equal tot: 


d(a_,¢4+) = 8" (£4 — z) =2M 


If we replace 8 by the corresponding unit vector 3 = 8/ |Bll,, we obtain B' (x4 — x-) = 
2M ||la; By setting M = 1, we obtain M = 1/||6||,. Maximizing the margin is then 
equivalent to maximize 1/ ||6||, or minimize ||64]|, (or ||6 12). Moreover, we notice that the 
inequality constraints? can be compacted as y; f (x;) > 1. Finally, we obtain the following 
optimization problem: 


a A . 1 
{30,8 = argmin lel (15.25) 
s.t. yi (Bo +2) 8) >1 fori=1,...,n 
We recognize a standard quadratic programming (QP) problem that can be easily solved 
from a numerical point of view. 


Using the training set given in Table 15.18 on page 989, and solving the QP problem 
(15.25), we obtain 6) = 2.416, 8; = —0.708 and b2 = 0.248. It follows that the margin M 
is equal to 1.333. Since the equation bo + 81271 + B2£2 = c is equivalent to: 


51We have bo + a1 8 =—M and fo +z] 8=M. 
52 Because we have set M = 1. 
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we deduce that the equations of the three hyperplanes H_, H and H+ are: 


H: 29 =—13.78642.857-2, (c= -1) 
H: £2 = —9.750 + 2.857 - zı (c=0) 
Ha: %2=—-5.7144+2.857-2, (c= +1) 


We have reported the estimated hyperplanes in Figure 15.31, and have also indicated the 
support vectors, which are only three. 


5 3 Fs 


FIGURE 15.31: Optimal hyperplane 


The historical approach to estimate a support vector machine is to map the primal QP 
problem to the dual QP problem. Using the results provided in Appendix A.1.3.1 on page 
1046, we can show that”: 


1 
â = argmin z2 To —a' ln (15.26) 


Tora 
s.t. a w= 
a > On 


where a is the vector of Lagrange multipliers associated to the n inequality constraints and 
T; j = yiyjz, xj. Moreover, we have: 


n 
= > QiYiTi 
i=1 


The optimal value of Êo can be deduced from any support vectors. In the case of a positive 
support vector x}, we have Êo = =1- zr Â, while we have Êo = Salg B for any negative 
support vector x—. Moreover, we can classify new observations by considering the following 
rule: 


j = sign (ĉo + ar) 


53See Exercise 15.4.8 on page 1027. 
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If we consider our example, we observe that â; is different from zero for three obser- 
vations: i € {3,8,15}. They correspond to the three support vectors that we have found 
graphically. We obtain @3 = 0.2813, âs = 0.0435 and G15 = 0.2378. With these values, we 
deduce that 8; = —0.708 and 8) = 0.248. In order to compute ĝo, we consider one of the 
support vectors and calculate Bo =y,—2) B . For example, in the case of the first support 
vector (or the third observation), we have: ĝo = 1 + 2.7 x 0.708 — 2 x 0.248 = 2.416. 


Remark 193 We may wonder what the rational of using the dual problem is. The primal 
problem is a QP problem with K +1 unknowns and n inequality constraints. The dual 
problem is a QP problem with n unknowns, one equality constraint and n box constraints. 
Since the last constraints are straightforward to manage, the second problem is easier to solve 
than the first problem. However, the dimension of the second problem is larger than this of 
the first problem, since we have to calculate the matrix of dimension n x n. Therefore, 
it is difficult to justify that the dual problem presents less computational issues than the 
primal problem. The reason is to be found elsewhere. In fact, the calculation of T involves 
the calculation of the inner product (x;,x;) = x] xj. We will see later that it corresponds 
to a covariance kernel, and the dual problem can be used in a more efficient way than the 
primal problem with other covariance kernels when we consider non-linear SVM problems. 


Soft margin classification The inequality constraints y; (Bo + x} B) > 1 ensure that 
all the training points are well-classified and belongs to the half-spaces H} and H_. How- 
ever, training data are generally not fully linearly separable. Therefore, we can relax these 
constraints by introducing slack variables €; > 0: 


yi (Bo + a) B) >1-& 
We then face three situations: 
1. if €; = 0, the observation i is well-classified since we have y; (80 + x} B) > 1; 


2. if 0 < é; < 1, the observation 7 is located in the ‘street’, that is in the area between the 
two separating planes H— and H+; in this case, é; can be interpreted as the margin 
error (€; < M); 


3. if & > 1, the observation 7 is fully misclassified. 


The quality of the classification can be measured by the misclassification error sum, that 
we can bound: M 

Niset 

i=1 


The parameter €* indicates the tolerance we have with respect to the hard margin classi- 
fication. Instead of adding the inequality constraint $7", &; < €* in Problem (15.25), we 
can penalize the objective function: 


oe 1 ” 
{30,8,€} = argmin 5 |@Ip+C E (15.27) 
i=1 
s.t. yi (Bo +x} 8) >1-& fori =1,...,n 


where the parameter C controls the level of errors. If C is large, the norm ||{||, can be 
large. On the contrary, if C is small, the sum 57", €; can be large, but not the norm ||4||,. 
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As the margin M is equal to 1/||6||,, C controls then the trade-off between the size of the 


margin and the misclassification error rate. The dual problem is°*: 


1 
â = argmin 50 Ta —a' ln (15.28) 


t yla=0 
me) 0; Sa <0 le: 


Again, we have B = J` ;_; diyiz;. Support vectors corresponds then to training points such 
that 0 < a; < C. For computing bo, we average over all the support vectors: 
T S 1{0 < â; < C} 


Since we have y; (8o + x 8) > 1—&; and €; > 0, the Kuhn-Tucker conditions implies that: 
Ê = max Q 1— yi (4 na x] B)) (15.29) 
The classification rule does not change, and we have ĝ = sign (4 +a! 8). 


C = 0.01 C = 0.03 


Soe NWR ODN @ 


x, 


= for nNWe ODN Ow 


FIGURE 15.32: Soft margin SVM classifiers 


We consider the previous training set given in Table 15.18 and we introduce two points 
(6.0, 5.0, +1) (i = 16) and (2.0, 2.0, —1) (i = 17). In this case, the training set is not linearly 
separable. Considering different values of C, we have represented the optimal hyperplanes 
in Figure 15.32. We verify that the margin decreases when C increases. In the case where 
C is equal to 0.05, we obtain ĝo = 1.533, 6; = —0.458, By = 0.168, and the optimal value 
of a; and €; are reported in Table 15.19. 


54See Exercise 15.4.8 on page 1027 
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TABLE 15.19: Soft margin classification with C = 0.05 


+ Yo Cin Lin Qi & 

1 +1 05 2.5 0.000 0.000 
2 +1 27 4.2 0.039 0.000 
3 +1 2.7 2.0 0.050 0.369 
4 +1 1.7 4.2 0.000 0.000 
5 +1 1.5 0.7 0.050 0.038 
6 +1 2.3 5.3 0.000 0.000 
7 +1 #40 69 0.050 0.143 
8 —1 64 4.5 0.050 0.354 
9 —1 7.7 2.2 0.000 0.000 


10 —1 88 6.0 0.000 0.000 
11 —1 74 6.5 0.050 0.231 
12 —1 6.5 1.7 0.000 0.000 
13 —1 83 1.3 0.000 0.000 
14 —1 6.0 1.3 0.039 0.000 
15 —1 5.0 0.5 0.050 0.324 
16 +1 6.0 5.0 0.050 1.379 
17 —1 2.0 2.0 0.050 1.952 


If we combine Equations (15.27) and (15.29), we obtain: 


F (80,8) = 561g +CS> max (0,1 - yi (60+ 278) 


i=l 
n 1 
- c. (Emax (+ a18) + z IAR) 
i=l 


We deduce that the optimization program is: 
1 
arg min R (x,y) + za l6 (15.30) 
where R (x,y) = Doi, £ (ai, yi) and £ (zi, yi) is the binary hinge loss: 


L (£1, Yi) = max (0,1 — y:i (Bo + 2; B)) 
It follows that the soft margin classification corresponds to a risk minimization problem 
with a ridge penalization. The problem is convex but non-smooth because £ (2;, y;) is non- 
differentiable. More generally, we can use other loss functions, for instance the 0 — 1 loss: 
; T 
T A ee) 28 
1 otherwise 

However, the associated risk measure is non-convex, and the minimization problem is com- 
putationally hard. A better approach is to consider the squared hinge loss: 


e E E y)” 


In this case, the problem is convex and smooth. Another popular loss function is the ramp 
loss: 

LEP (ony) = min (1, L8 (9) 
The derivation of the dual problems and the comparison of these different loss functions are 
discussed in Exercise 15.4.8 on page 1028. 
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SVM regression Support vector machines can be extended to output variables that are 
continuous. In this case, we have to define an appropriate loss function. For instance, if we 
consider the least squares loss function, we have: 


L" (zi yi) = (Yi — CAN 
where f (x) = Bo + a! 8. The corresponding SVM regression is then: 


ae 1 z 
{âp E} = agmine e (15.31) 


i=1 
s.t. yi = bo +x; B+ ĉi fori=1,...,n 


It is obvious that é; plays the role of the residual. This regression problem looks very similar 


to the SVM problem for the soft margin classification and the squared hinge loss function. 


In particular, we can show that the dual problem is”’: 


A s 1 T alr 1 T 
= = XX — hn -a Y 15.32 
â arg min > ( + 36 Je a (15.32) 
st. lla=0 
Once we have solved this QP problem, we can calculate the prediction for x: ĝ = Bo +a! B i 


Vapnik (1998) proposed another loss function in order to keep the formalism of the 
original soft margin problem: 


LS (xi yi) = 1 {lyi — f (w)| > e} (lyi — f (æ:)|— £) 
where £ > 0. It follows that: 


sen p J ui fele if lyi- f(z) 2 e 
L (ain) ={ 0 if lui — flal <e 
Therefore, we would like to find a hyperplane such that we don’t care about the errors that 
are smaller than e. We have: 


L¥ (iyi) = Uf{y—f (vi) < —e}-(f (ai) —yi—e) + 
1{y — f (z1) > e}- (yi — f (zi) — €) 
= {EG 20) & +1 {EF 20} e 


where £&> = f (a;) — yi — £ and €} = y; — f (xi) — £. We deduce that the e-SVM regression 
problem is: 


{AoA E} = argmin5 elite (& +6") (15.33) 
w=1 
f (zi) -yi < E 
s.t. i setg fori=1,...,n 
gt >0 


55See Exercise 15.4.8 on page 1028. 
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We can show that the dual problem is”®: 


a = arg min 5 (a7 — at)" XX! (a —at) + (15.34) 


E (a7 + at)" 1,+ (a7 — at)! Y 


1} (a7 —at)=0 
s.t. O,<a <C-l1, 
On <a < C1, 


where a~ and a* are the Lagrange multipliers of the inequality constraints. We have B a 
ie (Af — 67) a and: 


Bo = — 5 (uite-27B) + 5 (vi-e- 27) 


iESVT iEeSVt 


where SV" = {i :0< a; < C} and SVt = fi :0< ar < C} are the set of negative and 
positive support vectors, and nsy is the number of support vectors. 


TABLE 15.20: Comparison of OLS, LAD and SVM estimates 


TLSSVM SVM LSSVM «SVM 
Pej Oe BAN oies Oiaoi) 


l 
Êo | 3.446 2.331, 3.389 3.262, 3.446 2.331 
By) 1.544 1.893! 1.542 1.631! 1.544 1.893 
Ê | —1.645 —1.735 | —1.616 —1.526, —1.645 —1.735 
Bs | 2.895 2.908! 2.885 2.726! 2895 2.908 


We consider Example 100 on page 606, which has been used to illustrate the linear re- 
gression. In Table 15.20, we report OLS, LAD and SVM estimates for C = 1 and € = 1. In 
the last two columns, we consider the limit cases, when the constant C tends to +0 and 
€ is equal to zero. We notice that the LS-SVM estimator converges to the OLS estimator. 
This is quite intuitive since we use a least squares loss function. In some sense, the LS-SVM 
regression can be seen as a ridge regression. When C tends to +00, the ridge penalization 
disappears. More curiously, the -SVM estimator converges to the LAD estimator. In fact, 
the e-SVM regression is close to a ridge quantile regression. When € is equal to zero, we ob- 
tain a median regression with a Lz penalization. This is why the e-SVM estimator converges 
to the LAD (or median regression) estimator. 


Non-linear support vector machines As we have previously seen, we can introduce 
non-linearity by replacing the input data x by ¢ (x), where ¢ is a map from K-dimension 
to m-dimension non-linear feature space. In the case of SVM, we notice that the dual 
formulation generally requires the computation of the inner product (x, x’). This implies 
that we can use the same framework by replacing (x, x’) by (¢(a),¢(a’)). Manipulating 
#(a) can be tricky and not always obvious”, because of the high dimension of the non- 
linear space. Sometimes, it is better to manipulate the inner product, which is called a kernel 
function K (x, x"). For example, let us consider x = (x1, 72) and ¢ (x) = (a7, 2122, 1221, 73). 
The corresponding kernel function is K (a, 2’) = (a, a’), We also notice that two mapping 


56See Exercise 15.4.8 on page 1028. 
57The dimension m is generally much larger than the original dimension. 
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functions can give the same kernel. For instance, K (a,x’) = (x, x’)? can be generated by 
$ (x) = (z3, V2zx122, z2) x 


Since we have K (x,x') = ¢(x)' d(z’), we see that the kernel function is symmetric 
(Bishop, 2006). This is the main property to define kernel functions. Another way to char- 
acterize a kernel is to verify that the Kernel (or Gram) matrix K = (K; j), whose elements 
are Kij = K (xi, £j), is positive definite. Therefore, we can directly construct kernels with- 
out specifying ¢. For instance, e*, K + c, cK and K? are also kernel functions when c > 0 
and d € N. If Kı and Ko are two kernels, the sum K, + K2 and the product Kı - Ko are 
also kernel functions. The simplest kernel function is obtained by considering the identify 
function ¢ (x) = x. It follows that (x, 2’) +c and ((a, x") + c)? are also kernel functions. This 
last one is called the polynomial kernel and is very popular in SVM non-linear classification. 
Another popular kernel functions are the Gaussian (or radial basis function) kernel°®: 


1 2 
K (a, x") = exp (-s3 la — vè) 


and the neural network (or sigmoid) kernel: 


K (x, 2’) = tanh (cı (x, x’) + c2) 


go Ë 
acho eee 
ra 
a, 


sie e 


x2 


FIGURE 15.33: Transforming a non-linearly separable training set into a linearly sepa- 
rable training set 


In order to understand the interest of kernel, we consider a training set?®, which is not 
linearly separable. In the left panel in Figure 15.33, we have represented the two input 
variables x; and x2, and the response variable® y. Let us apply the polynomial mapping 


58We can show that dimension of the feature space is infinite: ¢ (x) = (ġo (x) ,..-,¢s (2) ,---; boo (£)) 


where: 
1 z? 
xv) = | ——— e 272g 
Pat ( s!o2s ) 


59The data are generated as follows: 71; = c1,; + r1, COSO; and z2, = fi (co; + r2, sin 6;) where 0; ~ 
Uo, 27]: In the case y = —1, we have c1, = C2, = 0, rii = T2445 ~ Uio,1] and fi (x) = x, otherwise we have 
cii = 1, c2, = 0, r1, ~ Uisg) 72,6 ~ Ujo, and fi (x) = |x| — 0.5. 

604 = +1 corresponds to a circle while y = —1 corresponds to a square. 
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z = ġ (x) = (x? — 10, 22122, 23). We have reported this transformation in the right panel 
in Figure 15.33. We observe that the training sets (x,y) and (z,y) are very different, since 
(z,y) is linearly separable. 


All the previous SVM algorithms are valid in the non-linear case and we obtain the 
following generic framework: 


1. the first step consists of defining the mapping function ¢. Let z; = ¢(a;) be the 
transformed data; 


2. in the second step, we calculate the estimated parameters Bo and B in the feature 
space Z; 
3. finally, a new observation «x is classified by computing 7 = sign (4 +o (x) B); in 
the case of the SVM regression, we have ĝ = By) + ¢ (x)" B. 
The previous framework can be simplified by considering the kernel function K instead 
of the mapping function ¢. Indeed, in the dual problems, the input variables are evaluated 


through the inner product (¢(2;) ,¢ (x;)}, that can be replaced®! by the kernel value K; j = 
K (ai, £j). The elements of the IT matrix used in hard and soft margin QP problem becomes: 


rij = yiys (as) | $ (ay) 
= yiyj his 


and we have T = y@®y' © K where K = (kij) is the Gram matrix. Since we have B= 
Di Qiyid (as) and Bo = Viesv (v; — ġ(z;)' Ê), we deduce that 9 = sign f (x) where: 


fæ) = bot+e(x)'B 
= > (o — ¢(2;)' X. diye c) +) diyid(x)" $ (a) 
jESV į=1 g=] 
= 5 (o E 5 Qiyik (a5, =)) + 5 Qiyik (x, xi) 
jESV ¿=l i=l 


for a new feature x. The estimation of ĝ involves the computation of K (x,;,x;) and K (a, «;). 
However, this expression can be reduced because most of the estimates â; are equal to zero. 


Remark 194 The derivation of the SVM non-linear regression is similar to the framework 
above, because the dual problem involves the computation of ¢(X)o(X)', which is exactly 
equal to the Gram matrix K. 


In Figure 15.33, we have shown that it was possible to transform the data in order to 
obtain separable training sets. For instance, the hyperplane H, which is estimated using the 
hard margin classifier, is defined by: 


0.884 — 0.142 - z1 + 0.268 - z2 — 1.422 - z3 = 0 
or equivalently: 


0.884 — 0.142 - (x? — 10) + 0.2682 - x12 — 1.422-23 =0 


61The fact that we can easily substitute inner products by the Gram matrix in SVM classification and 
regression is called the kernel trick. 
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Let us now consider a Monte Carlo simulation. We assume that X ~ N (04, I4) and Y = 
sign (NV (0,1)), meaning that there is no relationship between X and Y. We simulate 300 
observations for the training set, and we compute the hard margin classifier for several 
kernels: linear, quadratic and cubic polynomial with c = 0, and RBF (øc = 50 and ø = 20). 
Then, we estimate the predicted value ĝ; for all the observations and calculate the error 
rate. Since Y is independent from X, the true error rate is equal to 50%, because the score is 
purely random. Using 500 replications, we have estimated the density function of the error 
rate in Figure 15.34. We notice that the linear kernel classifier is the worst method, while 
the RBF kernel with ø = 20 is the best method. On average, the error rate is respectively 
equal to 45.0%, 41.5%, 37.7%, 31.8% and 22.3%. Therefore, we have overfitted the model, 
and this is particularly true with the kernel approach. Indeed, if we consider a validation 
set, we obtain an average error rate of 50% whatever the kernel function we have used. 
We conclude that kernel functions are very powerful, but they can lead to large overfitting 
problems. 


4 — Linear 
— — Quadratic 

aL ===: Cubic 
@— REF (c = 50) 
@---+ RBF (o = 20) 


Error rate (in %) 


FIGURE 15.34: Probability density function of in-sample error rates 


Extension to the multi-class problem We assume that we have nc disjoint classes 
Cj where j = 1,...,J. SVMs are inherently two-class classifiers, and the extension to the 
multi-class problem is not straightforward. However, we distinguish two main approaches. 
The first approach uses binary classification. In the case of the ‘one-against-all’ strategy, 
we construct J single SVM classifiers in order to separate the training data from every 
class to the other classes. For the jt} classifier, the response variable is then 20 ) = 41 if 
yi E Cj and zl) ) = 1 if yi ¢ Cj. Using this modified training set, we can estimate the 
discriminant function f (x) = BG ) +27 ĜO), In the two-class case, we have ĝ = sign f (x). 
In the multi-class problem, the prediction corresponds to the binary classifier that gives the 


Credit Scoring Models 1001 
largest value fO) (a) : 


9 €Cj« where j* =argmax f (a) 
j 


Another approach based on the binary classification is called the ‘one-against-one’ strategy. 
In this case, we construct J (J — 1) /2 single SVM classifiers in order to separate the training 
data from the class C; to the class C;’. Using the estimated discriminant function f GI’) (x) = 
pil ) + at BGI’) we can calculate the prediction gE) = sign fala’) (x). The empirical 
probability that the observation belongs to the class C; is then equal to®: 


7 2 1 tgo = +1} 
mes ITD 
We deduce that the classification rule is defined as follows: 


y €C;« where j* = arg max f; (x) 
j 


The second approach of multi-classification extends the mathematical framework of SVM 
that has been developed for the binary classification. The idea is then to consider a function 
y = f (x) : RE = {1,..., J} where: 


f (£) = arg max pP + 27 pO 
J 


We have now to estimate the J x 1 vector 69 and the K x J matrix 8. Crammer and Singer 
(2001) developed both hard and soft margin primal and dual problems in an elegant way. 
For a review and a comparison of these different methods, the reader can refer to Hsu and 
Lin (2002). 


15.2.3.4 Model averaging 


Model averaging (or ensemble averaging) combines multiple learning algorithms to ob- 
tain better predictive performance than could be obtained from the individual models. Two 
types of approaches are generally used. The first one constructs a family of ‘random’ models 
(bagging/random forests), whereas the second one generates a family of ‘adaptive’ models 
(boosting). 


The motivation of model averaging is to replace a single expert by a committee of 
experts. Sometimes, it is difficult to find a skilled expert, or his search has a large cost. 
In this case, we can imagine that the work produced by this high skilled expert can be 
done by a committee of less skilled experts. For establishing the committee, we can choose 
(randomly) experts with similar skills or we can choose experts that are complementary. 
The parallel with model averaging is obvious when we distinguish random and adaptive 
models. 


Bagging (bootstrap aggregation) Breiman (1996) proposed to use the bootstrap 
method to improve the performance of weak learners, in particular to reduce their vari- 
ance and the overfitting bias. Given a training set Z = { (xi, yi), i = 1,... n}, the bagging 
method generates ng bootstrapped training sets Z(,) and estimates the output function 


62We have gO) = +1 g'li) =j; 
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fis) (x) for each training set. The st? model is then defined by the pair (Ze, fi). In the 


case of regression, the predicted value is the mean of the predicted values of the different 


models: 
1 ng 


are di tis ( ) 


In the case of classification, we generally implement the majority vote rule: 


9 = MaxVote (fw (neat as) (z)) 


In this approach, the predictions of each model are considered as a ‘vote’. The final prediction 
corresponds to the class that has the maximum number of votes for multi-classification, or 
the majority vote for binary classification. As shown by Breiman (1996), the bagging method 
makes only sense when we consider non-linear models. 


FIGURE 15.35: An example of decision tree 


The bagging method is extensively used when considering decision trees. A tree is repre- 
sented by a series of binary splits. Each node represents a query, except the terminal nodes 
that correspond to the decision nodes. In the case of a classification tree, the output variable 
takes a discrete set of class labels, whereas the output variable takes continuous values when 
considering regression trees. In Figure 15.35, we report an example of a classification tree. 
We consider an applicant that would like a new credit. If the applicant has not a job, the 
credit will be automatically refused if the amount of the loan is too high. If the amount of 
the loan is less than 100, the final decision will depend upon whether the applicant owns his 
house. In this case, the client can obtain the credit if he applies for a mortgage or a home 
equity line of credit. If the applicant has a job, the bank computes his credit score. If the 
score is less than 500, the credit is rejected. Otherwise, the final decision will depend on 
the number of credits. If the applicant has less than 5 credits, the new credit is accepted, 
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otherwise it is refused. Decision trees are very popular in credit scoring for three main 
reasons. First, they can handle different types of variables (numeric, continuous, discrete, 
qualitative, etc.). Second, the rules and the decision process are very easy to understand. 
Third, they can be estimated with statistical models, and adjusted by experts. In practice, 
we use greedy approaches based on recursive binary splitting algorithms. One drawback of 
classification trees is that their prediction power is generally lower than the ones observed 
with logistic models, neural networks or support vector machines. We generally say that 
they produce weak classifiers (or learners). However, by combining classification trees and 
bagging, we can obtain the same performance than strong classifiers (Hastie et al., 2009). 


Remark 195 Bagging is also extensively used when we have a large set of predictors. In- 
stead of running one logistic regression with all the input variables, we can estimate many 
logit models with a limited number of explanatory variables (e.g. less than 10). In this 
approach, the bootstrap procedure concerns the variables, not the observations. By construc- 
tion, the bagging model will produce better and more stable predictions than the single logit 
model’. 


Random forests Let Ys) = fis) (X) be the output random variable produced by the st? 


bootstrapped model. If we assume that Ya), ..-;Y(ng) are iid random variables with mean 
u and variance a”, we have: 
. 1 ns . g 
var (2) = var | — = — 
(2 2 (s) n 


where Y is the bagging estimator. We deduce that var (7) — 0 when ng —> oo. Theo- 


retically, the bagging method can highly reduce the variance of the prediction. However, 
the hypothesis that Y(1),...,¥(ng) are not correlated is too strong. If we assume that the 
average correlation between bootstrapped models is equal to p, we obtain: 


var(?) = E DE 


1 a 2 l 7 A 
T z) (o-u) taD (o-u) (2o - u) 
S s=1 S rés 
NST _ ng (ng — 1) po? 
nz, n2 
I= 
= po? + o 


It follows that var (2) > po?. For example, if p = 90%, the maximum reduction of the vari- 


ance is only 10%. It follows that the improvement due to the bagging method can be highly 
limited when the correlation is high. Breiman (2001) proposed a modification of bagging by 
building de-correlated trees. At each iteration s, we select randomly a subset of predictors 
Xis), implying that the model is then defined by the 3-tuple (Zo: %e); fin): Generally, 


the randomization step is done with a fixed number K* of bootstrapped predictors™. 


63 For instance, if we consider the degenerate case when the number of observations is lower than the 
number of predictors (n < K), the single logit model is highly noisy, which is not the case of the bagging 
model. 

64The recommended default value is K* = VK for classification and K* = K/3 for regression (Hastie et 
al., 2009). 
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Remark 196 The method of random forests can be viewed as a double bagging method. 
Indeed, it mixes observation-based and feature-based bagging methods. 


Boosting In this approach, the training set (Z,W) is defined by including the weight of 
each observation: 

(Z,W) = { (£i, Yi, Wi) „i = lieet} 
At each iteration s, boosting computes adaptive weights W,) and fits the learning algorithm 


fis) with the training set (Zz ; Wes) )- Then, it combines the different learning models through 


a weighting rule: 
ns 


9 = f(x) = Avg (ws: fe) (2)) 


where w, is the weight of the st? learning model and Avg is the averaging function. In the 
case of a binary classification, we have: 


s=1 


The concept of boosting has been introduced by Schapire (1990), who proved that a ‘weak’ 
learning algorithm can be ‘boosted’ into a ‘strong’ learning algorithm®°. In the 1990s, many 
boosting algorithms have been developed, but the high recognition comes with the adaptive 
boosting method proposed by Freund and Schapire (1997), and described in Algorithm 2. 

The algorithm concerns the classification problem y € {—1, +1}. We begin by initializing 
the observation weights w; to 1/n. Then, we fit the classifier fay using the training set Z, 
because the initial weights have no impact. The first step is the usual manner to fit a 
classification model. To improve the accuracy, boosting constructs at iteration s another 
training set by calculating new observation weights: 


Wi s if 7 is well-classified 
Wi = i i 
stl Wises otherwise 


If the observation i is well-classified, the weight remain the same, otherwise it increases: 
Wi,s+1 > Wi,s- Indeed, the update makes only sense if the error rate £,) is smaller than 
50%, implying that ws is strictly positive®®. At iteration s + 1, the classifier will be fitted 
with the training set (zZ ; Ws+1)); where the misclassified observations at iteration s are 
more weighted than the well-classified observations. Therefore, the weighting scheme Ws+1) 
forces the new classifier fs+1) to be more focus on the training observations that are difficult 
to classify. Finally, we use the majority vote to predict y: 


P (Š io w) 


s=l 


We represent the classifier weight ws with respect to the loss function (or the error rate) in 
Figure 15.36. If the error rate is equal to 50%, the weight ws of the s‘® classifier is equal 
to zero. This classifier does not participate to the final model, because it corresponds to a 
random guessing model. On the contrary, if the error rate of one classifier is equal to zero, 
its allocation is infinite in the final model. In Figure 15.36, we also show the impact of the 


65 A training set is said to be strongly learnable if “there exists a polynomial-time algorithm that achieves 
low error with high confidence” for all the observations (Schapire, 1990). A weak learning algorithm performs 
just slightly better than a random learning algorithm. 

66Tf the classifier has an error rate greater than 50%, it performs worse than random guessing. 
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Algorithm 2 AdaBoost.M1 binary classifier 


Estimate the AdaBoost.M1 classifier § = f (x) 


Initialize the observation weights w; =1/n fori=1,...,n 
for s = 1 : ng do 
Ws) {> (wis; TF ,Wn,s) 


Fit the classifier fis) using the training set (Zz ; Ws)) 
Compute the loss function: 
Di Wis: l {yi Z fo) (z:)} 


Lis) m D Wis 


Calculate the classifier weight ws: 


1— Lis 
ws < In (=) 
Ls) 


Update the observation weights: 


Wi s+1 — Wi gets Hufo (ei) 


Normalize the observation weights: 


nm 
yar Wi,s+1 


end for 


return f (x) = sign ei ws f(s) (x)) 


error rate on the weights w;,,41 when we consider a sample of two observations. We assume 
that the first observation is misclassified while the second observation is well-classified at 
the step s. This implies that the first observation will have more weight at the step s+ 1. 
The re-weighting of observations also depends on the error rate. If the error rate of the st? 
model is low, the re-weighting is strong, in order to separate well-classified and misclassified 
observations. It is not obvious that the error rate is a monotonous function of the iteration 
s. At the beginning, the error rate can increase or decrease depending whether the initial 
classifier is good or bad. But, at the end, the error rate must reach the upper bound 50%. 

In order to illustrate the boosting method, we consider the data given in Table 15.21 
and the logit model: 

Pr {yi = 1} = F (bo + Bia) 

where F (x) is the logit function. We have Pr{y; = —1} = 1 — F (bo + 612;). Given the 
pattern z, the classification rule is then: 


j=2-1{F (+a) > 5} —1 


where Bo and By are the parameters, which have been estimated by the method of maximum 
likelihood. Using our data, we obtain Bo = 0.4133 and Bi = 0.2976, and the error rate is 
equal to 45%. In the case of the boosting algorithm, the first iteration is exactly the same 
as the previous logit estimation. For the second iteration, we have to calculate the weights 
Wi,2 of each observation. We have wı = 0.2007 because La) = 45%. Therefore, we update 
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FIGURE 15.36: Weighting schemes of the boosting approach 


the weights. In Table 15.21, we have reported the predicted value ĝi s at the iteration step 
s, and also the variable J;,, which indicates the misclassified observations. For instance, 
observations 3, 4, 5, 7, 9, 10, 12, 16 and 17 are not well-classified at the first iteration by 
the logit model. This is why the weight of these observations increase by e“1. While the 
weights w; take the uniform value of 5%, the weights w; are different with respect to 
observations. After normalizing, w;,2 is equal to 5.56% for observations that are misclassified 
at the first iteration, otherwise it is equal to 4.55%. Using these weights, we estimate the 
logit model, and found o,2 = 0.2334 and ĝi, 2 = 0.2558 (Table 15.22). The loss function 
is then equal to Liz) = 38.89%. We see that the second logit model has improved the 
classification for two observations (i = 3 and i = 10). We can continue the algorithm. In 
our example, the boosting method stops after 5 iterations, because £L(5) = 50.00%. The 
fifth estimated classifier is then a pure random guessing model. While the number of well- 
classified observations is equal to 11 for the logit model, it is equal to 13 for the boosting 
model®’. From a general point of view, the boosting is interesting only if we use a large 
dataset of observations and variables. When considering small datasets, we face an obvious 
overfitting issue. 


Remark 197 The boosting method is based on weighted estimation methods. In Chapter 
10, we have already defined the weighted least squares estimator®®. In Exercise 15.4.10 on 
page 1029, we extend the method of maximum likelihood, neural networks and support vector 
machines when observations are weighted. 


Hastie et al. (2009) showed that boosting is related to additive models: 


ns 


g(x) = X. bisB (2; %)) 
s=1 


87 The boosting classifier corresponds to the column ĝ; in Table 15.21. 
68 See Section 10.1.1.5 on page 612. 
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TABLE 15.21: Illustration of the boosting algorithm (ng = 2) 


Wi,1 


Wi,2 


l 
Di2 


. A | A A 
l Yi Ti | (in %) Gia Via | (in %) Yi,2 | ĝi vi 
1 1 0.597 5.00 1 4.55 1 1 
2 1 1.496 i 5.00 1 4.55 1 1 
3 —1 —0.914, 5.00 1 v 5.56 —1 —1 
4 —1 —0.497 ' 5.00 1 [v ! 5.56 I ytl 
5 —1 0.493, 5.00 1 v 5.56 1 v lv 
6 1 0.841 ' 5.00 1 | 4.55 1 ae 
7 —1 —0.885 ; 5.00 1 Vv , 5.56 1 Vv, lv 
8 1 1.418 ! 5.00 1 4.55 1 1 
9 —1 —0.183 , 5.00 1 Vv , 5.56 1 Vv, lv 
10 —1 —1.298 5.00 1 v 5.56 —1 —1 
11 1 —0.324 , 5.00 1 4.55 1 1 
12 1 —1.454 i 5.00 -1 v 5.56 -1 v l -1 v 
13 1 —0.270 , 5.00 1 4.55 1 1 
14 1 —0.770 ! 5.00 1 | 4.55 1 1 
15 1 0.232 ; 5.00 1 4.55 1 1 
16 —1 0.970' 5.00 1 Vv ! 5.56 1 Vvi!iitv 
17 —1 1.196 , 5.00 1 Vv , 5.56 1 Vv, lv 
18 1 0.578! 5.00 1 4.55 1 1 
19 1 —0.686 , 5.00 1 | 4.55 1 1 
20 1 —0.590 ! 5.00 1 4.55 1 1 
TABLE 15.22: Estimated model at each boosting iteration (ns = 5) 


s 1 3 4 5 
Bo,s 0.4133 0.2334 —0.0771 0.0009 0.0103 
Bis 0.2976 0.2558 0.0278 0.0277 —0.0751 
Lis) 0.4500 0.3889 0.4805 0.4741 0.5000 

ws 0.2007 0.4520 0.0780 0.1038 0.0000 
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where (5) is the expansion coefficient and B (z; Ws)) is the basis function at iteration s. 
Forward stagewise regression consists in finding the optimal values of Bs) and 45): 


(Be) 1) = arg min X L (Yi, §(s—1) (#1) + Bis) B (2; %e))) 
i=1 


where 95) (@%) = X j Êi B (£; csn) and £ is the loss function. In the case of boost- 
ing, we can show that B(2;7.)) = fis) (x), Bs) = ws and L(y, f(x)) = e~¥f@), We 
recognize an additive logit model with the softmax loss function. Using this framework, 
Friedman (2002) proposed gradient boosting models. The idea is to minimize the loss func- 
tion >", £ (yi, f (wi)) with respect to the learning algorithm f (x). The steepest descend 
algorithm consists in the following iterations: 


. ‘ IL (yi, fst) (vi) 
Fay (£i) = fis-1) (£1) — nis) ( OF (a 


Instead of finding the optimal classifier fis), gradient boosting estimates the optimal step 
7s) and iterates the previous formula. Finally, the optimal model f (x) is given by the 
estimate fins) (x) at the last iteration. 


Remark 198 The table below summarizes the differences between bagging, random forests 
and boosting: 


Method PA Zi) A Ms) A 
Bagging (Ze, f (s)) V 

Random forests (Zio) Xis, fo) v V 
Boosting (z, Wis), fo) v V 


In the bagging method, the randomization step concerns observations. In the case of random 
forests, the models are generated by randomizing both observations and variables. Boosting 
is a very different approach, since all the observations and variables are used to construct the 
weak learning models. In this method, the perturbations are introduced by using a weighting 
scheme for the observations that changes at each iteration. The randomization step is then 
replaced by an adaptive step, where the (s+ 1)*® model depends on the accuracy of the s*? 
model. Finally, boosting uses a weighted average of the different weak learning algorithms. 


15.3 Performance evaluation criteria and score consistency 


This section is dedicated to the performance assessment of a score. Using information 
theory, we would like to know if the scoring system is informative or not. The second 
paragraph presents the graphical tools in order to measure the classification accuracy of the 
score. Finally, we define the different statistical measures to estimate the performance of 
the score. We also notice that the tools presented here can be used with both the training 
set or the validation set. 
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15.3.1 Shannon entropy 
15.3.1.1 Definition and properties 


The entropy is a measure of unpredictability or uncertainty of a random variable. Let 
(X,Y) be a random vector where pij = Pr{X =2;,Y = yj}, pi = Pr{X = z;} and pj = 
Pr{Y = yj}. The Shannon entropy of the discrete random variable X is given by®? 

H(X)=- eB In p; 


We have the property 0 < H (X) < Inn. H is equal to zero if there is a state 7 such that 
pi = 1 and is equal to Inn in the case of the uniform distribution (p; = 1/n). The Shannon 
entropy is a measure of the average information of the system. The lower the Shannon 
entropy, the more informative the system. For a random vector (X,Y), we have: 


= Ss 2a Pig In pij 


We deduce that the conditional information of Y given X is equal to: 
H(Y|X) = Ex[H(Y |X =2)| 


F a. ¡Pijn nid 


= H(X,Y)- H(X) 


We have the following properties: 


e if X and Y are independent, we have H (Y | X) = H (Y) and H (X,Y) = H (Y) + 
H(X); 


e if X and Y are perfectly dependent, we have H (Y | X) = 0 and H (X,Y) = H (X). 


The amount of information obtained about one random variable, through the other random 
variable is measured by the mutual information: 


I(X,Y) H(Y)+H(X) -H (X,Y) 
= ae a pana i 


Figure 15.37 shows some examples of Shannon entropy ee For each example, we 
indicate the probabilities p; ; and the values taken by H (X), H (Y), H (X,Y) and I (X,Y). 
The top/left panel corresponds to a diffuse system. The value of H (X,Y) is maximum, 
meaning that the system is extremely disordered. The top/right panel represents a highly 
ordered system in the bivariate case and a diffuse system in the univariate case. We have 
H(X |Y)=H(Y | X) =0, implying that the knowledge of X is sufficient to find the state 
of Y. Generally, the system is not perfectly ordered or perfectly disordered. For instance, 
in the case of the system described in the bottom/left panel, the knowledge of X informs 
us about the state of Y. Indeed, if X is in the third state, then we know that Y cannot be 
in the first or sixth state. Another example is provided in the bottom/right panel. 


Remark 199 Jf we apply the Shannon entropy to the transition matrix of a Markov chain, 
we set X =R(s) and Y = R(t) where R(t) is the state variable at the date t. We obtain: 


A(R (t) | --n Day inp?) 


where pij = Pr{R(t+1)=j|R(t)=i}, S = {1,2,...,K} is the state space of the 
Markov chain and n* is the associated stationary distribution. 


69We use the convention p; ln p; = 0 when p; is equal to zero. 
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FIGURE 15.37: Examples of Shannon entropy calculation 


15.3.1.2 Application to scoring 


Let S and Y be the score and the control variable. For instance, Y is a binary random 
variable that may indicate a bad credit (Y = 0) or a good credit (Y = 1). Y may also 
correspond to classes defined by some quantiles. With Shannon entropy, we can measure 
the information of the system (S, Y). We can also compare two scores Sı and S2 by using 
the statistical measures I(S,,Y) and I(S2,Y). Let S3 be the aggregated score obtained 
from the two individual scores Sı and S2. We can calculate the information contribution 
of each score with respect to the global score. Therefore, we can verify that a score really 
adds an information. 


We consider the following decision rule: 


S<0=>S*=0 
S>o0=>S*=1 
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We note nij the number of observations such that S* = i and Y = j. We obtain the 
following system (S*,Y): 


|Y=0 Y=1 
S*=0 | noo no 
S* =] nı,0 11,1 


where n = noo + 0,1 + 1,9 + n1, is the total number of observations. The hit rate is the 


ratio of good bets: 
noo + 1,1 


n 


H = 


This statistic can be viewed as an information measure of the system (S, Y). When there 
are more states, we can consider the Shannon entropy. In Figure 15.38, we report the 
contingency table of two scores Sı and Sg for 100 observations’. We have I ($1, Y) = 0.763 
and I (S2, Y) = 0.636. We deduce that Sı is more informative than S2. 


yı Y Y Ya Y 


S1 S1 
S2 52 
83 53 
S4 S4 
S5 55 
56 56 
I (51, Y) = 0.763 I(S1,Y) = 0.636 


FIGURE 15.38: Scorecards Sı and Sə 


15.3.2 Graphical methods 


We assume that the control variable Y can takes two values: Y = 0 corresponds to a bad 
risk (or bad signal) while Y = 1 corresponds to a good risk (or good signal). Gouriéroux 
(1992) introduced 3 graphical tools for assessing the quality of a score: the performance 
curve, the selection curve and the discrimination curve’!. In the following, we assume that 
the probability Pr{Y = 1 | S > s} is increasing with respect to the level s € [0,1], which 
corresponds to the rate of acceptance. We deduce that the decision rule is the following: 


e if the score of the observation is above the threshold s, the observation is selected; 


e if the score of the observation is below the threshold s, the observation is not selected. 


70Rach score is divided into 6 intervals (s1,..., 86) while the dependent variable is divided into 5 intervals 


(y1,---, 45). 
71See also Gouriéroux and Jasiak (2007). 
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If s is equal to one, we select no observation. If s is equal to zero, we select all the observa- 
tions. In a scoring system, the threshold s is given. Below, we assume that s is varying and 
we analyze the relevance of the score with respect to this parameter. 


15.3.2.1 Performance curve, selection curve and discriminant curve 
The performance curve is the parametric function y = P (x) defined by: 
x(s)=Pr{S > s} 
Pr{Y =0|S >s 
so- PHY ilsa 
Pr {Y = 0} 


where x (s) corresponds to the proportion of selected observations and y (s) corresponds to 
the ratio between the proportion of selected bad risks and the proportion of bad risks in 
the population. The score is efficient if the ratio is below one. If y(s) > 1, the score selects 
more bad risks than those we can find in the population”?. If y (s) = 1, the score is random 
and the performance is equal to zero. In this case, the selected population is representative 
of the total population. 


The selection curve is the parametric curve y = S (x) defined by: 


{ x(s)=Pr{S> s} 
y(s)=Pr{S>s|Y=0} 


where y (s) corresponds to the ratio of observations that are wrongly selected. By construc- 
tion, we would like that the curve y = S (x) is located below the bisecting line y = x in 
order to verify that Pr{S > s| Y =0} <Pr{S> s}. 


Remark 200 The performance and selection curves are related as follows’®: 
S(x) = £P (x) 
The discriminant curve is the parametric curve y = D (x) defined by: 
D(x) = gı (g` (2) 


where: 
gy (s) =Pr{S >s|Y =y} 


It represents the proportion of good risks in the selected population with respect to the 
proportion of bad risks in the selected population. The score is said to be discriminant if 
the curve y = D (x) is located above the bisecting line y = z. 


72Tn this case, we have Pr{Y = 0 | S > s} > Pr{Y = 0}. 
73We have: 
Pr{S > s,Y =0} 


Pr{S>s|Y =0} Pr{¥ =0) 


Pr{S > s,Y =0} 
Pr{S > s}Pr{Y = 0} 
Pr{Y =0|S> s} 

— Pr{Y=0 


= Pr{S >s}. 


= Pr{S >s}. 
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15.3.2.2 Some properties 


We first notice that the previous parametric curves do not depend on the probability 
distribution of the score S, but only on the ranking of the observations. They are then 
invariant if we apply an increasing function to the score. Gouriéroux (1992) also established 
the following properties: 


1. the performance curve (respectively, the selection curve) is located below the line 
y = 1 (respectively, the bisecting line y = x) if and only if cov (f (Y),g(S)) > 0 for 
any increasing functions f and g; 


2. the performance curve is increasing if and only if: 
cov (f(Y),g(S)| 52s) 20 


for any increasing functions f and g, and any threshold level s; 


3. the selection curve is convex if and only if E [f (Y) | S = s] is increasing with respect 
to the threshold level s for any increasing function f. 


Remark 201 The first property is the least restrictive. It allows us to verify that the score 
S is better than a random score. We can show that (3) = (2) => (1). The last property is 
then the most restrictive. 


A score is perfect or optimal if there is a threshold level s* such that 
Pr{Y =1 | S > s*} = 1 and Pr{Y =0|S < s*} = 1. It separates the population between 
good and bad risks. Graphically, the selection curve of a perfect score is equal to: 


x—1 
=1 PriY =1}}- (1+ ———— 
y= Leo Pry=1)- (145742) 
Using the relationship S (x) = xP (x), we deduce that the performance curve of a perfect 


score is given by: 
B B ¢—Pr{Y¥ =1} 
y= too Pry = (a) 


For the discriminant curve, a perfect score satisfies D (x) = 1. When the score is random, we 
have S (x) = D(a) = a and P (x) = 1. In Figure 15.39, we have reported the performance, 
selection and discriminant curves of a given score S. We also show the curves obtained with 
an optimal (or perfect) score and a random score. A score must be located in the area 
between the curve computed with a random score and the curve computed with a perfect 
score, except if the score ranks the observations in a worst way than a random score. 


Gouriéroux (1992) also established two properties for comparing two scores Sı and S2: 


e the score Sı is more performing on the population P, than the score Sz on the pop- 
ulation P, if and only if the performance (or selection) curve of (S1, P,) is below the 
performance (or selection) curve of (S2, P2); 


e the score Sı is more discriminatory on the population P, than the score Sj on the 
population P» if and only if the discriminant curve of (S1, P,) is above the discriminant 
curve of (S2, P2). 


Figure 15.40 illustrates the case where the score S is better than the score $j. However, 
the order is only partial. Most of the time, the two scores cannot be globally compared. An 
example is provided in Figure 15.41. The second score is not very good to distinguish good 
and bad risks when it takes small values, but it is close to a perfect score when it takes high 
values. 
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FIGURE 15.39: Performance, selection and discriminant curves 
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FIGURE 15.40: The score S is better than the score So 
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FIGURE 15.41: Illustration of the partial ordering between two scores 


15.3.3 Statistical methods 

Since the quantitative tools for comparing two scores are numerous, we focus on two 
non-parametric measures: the Kolmogorov-Smirnov test and the Gini coefficient. 
15.3.3.1 Kolmogorov-Smirnov test 


We consider the cumulative distribution functions: 
Fo (s) = Pr{S < s| Y =0} 


and: 
F,(s)=Pr{S<s|Y=1} 


The score S' is relevant if we have the stochastic dominance order Fp > Fj. In this case, 
the score quality is measured by the Kolmogorov-Smirnov statistic: 


KS = max [Fo (s) — Fi (s)| 


It takes the value 1 if the score is perfect. The KS statistic may be used to verify that the 
score is not random. We then test the assumption Ho : KS = 0 by using the tabulated critical 
values’ In Figure 15.42, we give an example with 5000 observations. The KS statistic is 
equal to 36%, which implies that Ho is rejected at the confidence level 1%. 


74The critical values at the 5% confidence level are equal to: 


n | 10 50 100 500 5000 
CV | 40.9% 18.8% 13.4% 6.0% 1.9% 
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FIGURE 15.42: Comparison of the distributions Fọ (s) and F; (s) 


15.3.3.2 Gini coefficient 


The Lorenz curve The Gini coefficient is the statistic, which is the most used for mea- 
suring the performance of a score. It is related to the concept of Lorenz curve, which is a 
graphical representation of the concentration. Let X and Y be two random variables. The 
Lorenz curve y = £ (x) is the parametric curve defined by: 


x =Pr{X < x} 
y=Pr{Y <y|X <z} 


In economics, x represents the proportion of individuals that are ranked by income while 
y represents the proportion of income. In this case, the Lorenz curve is a graphical repre- 
sentation of the distribution of income and is used for illustrating inequality of the wealth 
distribution between individuals. For example, we observe that 70% of individuals have only 
34% of total income in Figure 15.43. 


Definition of the Gini coefficient The Lorenz curve has two limit cases. If the wealth 
is perfectly concentrated, one individual holds 100% of the total wealth. If the wealth 
is perfectly allocated between all the individuals, the corresponding Lorenz curve is the 
bisecting line. We define the Gini coefficient by: 
A 

A+B 

where A is the area between the Lorenz curve and the curve of perfect equality, and B is the 
area between the curve of perfect concentration and the Lorenz curve. By construction, we 
have 0 < Gini (£) < 1. The Gini coefficient is equal to zero in the case of perfect equality 
and one in the case of perfect concentration. We have: 


Gini (L£) 


Gini (£)= 1-2 | L(x) dx 
0 
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Application to credit scoring We can interpret the selection curve as a Lorenz 
curve. We recall that F(s) = Pr{S < s}, Fo(s) = Pr{S<s|Y=0} and Fi(s) = 
Pr{S < s| Y = 1}. The selection curve is defined by the following parametric coordinates: 


{ x (s) =1—F(s) 
y (s) = 1 — Fo (s) 


The selection curve measures the capacity of the score for not selecting bad risks. We could 
also build the Lorenz curve that measures the capacity of the score for selecting good risks: 


{ xu(s)=Pr{S > s}=1-—F (s) 
y(s)=Pr{S>s|Y =1}=1-F,(s) 


It is called the precision curve. Another popular graphical tool is the receiver operating 
characteristic (or ROC) curve (Powers, 2011), which is defined by: 


oe ee 
y(s)=Pr{S>s|Y =1}=1-F,(s) 


An example for a given score S' is provided in Figure 15.44. For all the three curves, we can 
calculate the Gini coefficient. Since the precision and ROC curves are located above the 
bisecting line, the Gini coefficient associated to the Lorenz curve £ becomes”: 


gini(e) =2 | cl) da—1 


75 An alternative to the Gini coefficient is the AUC measure, which corresponds to the area under the 
ROC curve. However, they give the same information since they are related by the equation: 


Gini (ROC) = 2 x AUC (ROC) — 1 
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The Gini coefficient of the score S is then computed as follows: 


Gini* (8) = oo 


where £* is the Lorenz curve associated to the perfect score. 


Selection curve Precision curve 
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1-F(s) 
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T Dias 


0.0 0.2 04 O06 08 1.0 
1-Fo(s) 


FIGURE 15.44: Selection, precision and ROC curves 


Remark 202 The Gini coefficient is not necessarily the same for the three curves. However, 
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if the population is homogeneous, we generally obtain very similar figures”. 


15.3.3.3 Choice of the optimal cut-off 


The choice of the optimal cut-off s* depends on the objective function. For instance, we 
can calibrate s* in order to achieve a minimum market share. We can also fix a given selection 
rate. More generally, the objective function can be the profitability of the activity. From a 
statistical point of view, we must distinguish the construction of the scoring model and the 
decision rule. In statistical learning, we generally consider three datasets: the training set, 
the validation set and the test set. The training set is used for calibrating the model and 
its parameters whereas the validation set helps to avoid overfitting. But the decision rule is 


based on the test set. 


T6 For instance, we obtain the following results with the score S that has been used in Figure 15.44: 


Curve Gini(L) Gini(L*) Gini* (S) 
Selection 20.41% 40.02% 51.01% 
Precision 30.62% 59.98% 51.05% 

ROC 51.03% 100.00% 51.03% 
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Confusion matrix A confusion matrix is a special case of contingency matrix. Each row 
of the matrix represents the frequency in a predicted class while each column represents the 
frequency in an actual class. Using the test set, it takes the following form: 


Y=0 Y=1 
S<s n0,0 not 
S>s "1,0 nı, 
no = No, + N1,0 Ni = No, + N1, 


where n; ; represents the number of observations of the cell (i, j). We notice that each cell 
of this table can be interpreted as follows: 


Y=0 Y=1 
It is rejected It is rejected, 
S < s | and it is a bad risk but it is a good risk 
(true negative) (false negative) 
It is accepted, It is accepted 
S > s | but it isa bad risk and it is a good risk 
(false positive) (true positive) 
(negative) (positive) 


The cells (S < s, Y =0) and (S > s,Y = 1) correspond to observations that are well- 
classified: true negative (TN) and true positive (TP). The cels (S > s, Y = 0) and 
(S < s, Y = 1) correspond to two types of errors: 


1. a false positive (FP) can induce a future loss, because it may default: this is a type I 
error; 


2. a false negative (FN) potentially corresponds to a loss of a future P&L”: this is a 


type II error. 


Classification ratios Binary classification defines many metrics for measuring the per- 
formance of the classifier” (Fawcett, 2006): 


TP 
True Positive Rate TPR = TP LEN 
False Negative Rate FNR = FNATP =1-—TPR 
True Negative Rate TNR = TNS FP 
False Positive Rate FPR = FP IN 1—TNR 


The true positive rate (TPR) is also known as the sensitivity or the recall. It measures the 
proportion of real good risks that are correctly predicted good risk. Fawcett (2006) also 
defines the precision or the positive predictive value (PPV): 


ppv — —JP 
TP + FP 
T7 This is an opportunity cost. 
78 We rewrite the confusion matrix as follows: 
Y=0 Y=1 
S<s TN FN 
S>s FP TP 
N=TN+FP P=FN+TP 
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It measures the proportion of predicted good risks that are correctly real good risk. Besides 
these metrics, statisticians also use two generic metrics: 


1. the accuracy considers the classification of both negatives and positives: 


TP+TN | TP+TN 
P+N TP+FN+TN+FP 


ACC = 


2. the Fi score is the harmonic mean of precision and sensitivity: 


2 
1/precision + 1/sensitivity 
2-PPV-TPR 
PPV + TPR 


F = 


Example 171 We consider three scoring systems that have been calibrated on a training 
set. These systems produce a score between 0 and 1000. A low value predicts a bad risk 
while a high value predicts a good risk. In order to calibrate the cut-off, we consider a test 
set, which is composed of 10000 new observations. In Table 15.23, we report the confusion 
matrix of each scoring system for different cut-off values (100, 200 and 500). 


TABLE 15.23: Confusion matrix of three scoring systems and three cut-off values s 


Score s = 100 s = 200 s = 500 
S 386 616 | 698 1304 |1330 3672 
1614 7384 |1302 6696 | 670 4328 
So 372 632} 700 1304) 1386 3616 
1628 7368 | 1300 6696 | 614 4384 
Ss 382 616} 656 1344]1378 3624 
1618 7384) 1344 6656| 622 4376 
Perfect 1 000 0 | 2000 0 | 2000 3000 
1000 8000 0 8000 0 5000 
Using confusion matrices given in Table 15.23, we calculate the different classification 


ratios and report them in Table 15.24. In addition to the three scoring systems, we have 
also considered a perfect score in order to show what the best value is for each classification 
ratio. Finally, we indicate the best scoring system in Table 15.25. We notice that it depends 
on the ratio and on the value of the cut-off. For instance, if we want to maximize the true 
positive ratio or minimize the false negative ratio, Sı is the best scoring system for low 
value of s while Sy is better when s is equal to 500. For the other ratios, S; seems to be the 
best system when s = 100, otherwise S2 dominates Sı and S3 when s = 200 or s = 500. 


Remark 203 We recall that Fo (s) = Pr{S < s| Y =0} and F,(s)=Pr{S<s|Y=l1}. 
We deduce that TNR = Fo (s), FNR = F; (s), FPR = 1 — Fọ (s) and TPR = 1 — F; (s). 
Therefore, the ROC curve is the parametric curve, where the x-coordinates are the false 
positive rates and the y-coordinates are the true positive rates. Generally, we note a and 
E the type I and II errors. We may also interpret the ROC curve as the relationship of 
1— 8 (s) with respect to a (s). 
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TABLE 15.24: Binary classification ratios (in %) of the three scoring systems 


Score s | TPR FNR TNR FPR PPV ACC Fi 
100 | 92.3 7.7 19.3 80.7 82.1 77.7 86.9 
Sy 200 | 83.7 163 349 65.1 83.7 73.9 83.7 
500 | 54.1 45.9 66.5 33.5 86.6 56.6 66.6 
100 | 92.1 79 186 814 81.9 77.4 86.7 
S2 200 | 83.7 16.3 35.0 65.0 83.7 74.0 83.7 
500 | 54.8 45.2 69.3 30.7 87.7 57.7 67.5 
100 | 92.3 7.7 19.1 80.9 82.0 77.7 86.9 
S3 200 | 83.2 16.8 32.8 67.2 83.2 73.1 83.2 
500 | 54.7 45.3 68.9 31.1 87.6 57.5 67.3 
100 | 100.0 0.0 50.0 50.0 889 90.0 94.1 
Perfect 200 | 100.0 0.0 100.0 0.0 100.0 100.0 100.0 
500 | 62.5 37.5 100.0 0.0 100.0 70.0 76.9 


TABLE 15.25: Best scoring system 


Cut-off 


TPR FNR TNR FPR PPV ACC F, 


100 
200 
500 


Sif SiS SS Ss Ss c 
Bio S/2 2 2 2 2 & 
S2 S By. aa 465. Sa Sa 


15.4 Exercises 


15.4.1 Elastic net regression 


We consider the standard linear model: 


Y=X8+U 


where Y is a nx 1 vector, X is a nx K matrix and U ~ M (0, oT): Let B be the estimator 


of 6, that is the solution of the following least squares problem: 


Ê = argmin 5 (Y - XB)" (Y - X6) + È (a 18l, + 0 -= o) IBIR) 


where à > 0 and a € [0,1]. 


1. We consider the case a = 0, which corresponds to the ridge regression. 


(a) Find the optimal estimator ĝ"i4dse, 


(b) What is the relationship between the ridge estimator ĝ"'dse and the ordinary 
least squares B°!S? 


(c) Deduce the expression of E [8e] . Show that ĝ"idee is a biased estimator except 


if \=0. 


(d) Demonstrate that the covariance matrix of ĝ"'48° is equal to: 


var (e) =o" (X'X + Q) 
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where Q is a matrix to determine. Deduce that: 
var (6°) > var (Bate) 


where > is the positive definite ordering. 


Let Y be the predicted values of Y. If Y= HY, the model degree of freedom is 
equal to the trace of H. Show that the degree of freedom of the ridge model is 


— 
o>) 
Ww 


equal to: 
K 2 
dfmodel = k 
2 
aR ee 
where (s1,..., sg) are the singular values of X (Hastie et al., 2009). 


(f) What does the previous results become when X is an orthonormal matrix? 


2. We consider the case a > 0, which corresponds to the elastic net regression (Zou and 
Hastie, 2005). 


(a) Write the corresponding QP program. 
(b) Consider the data of Example 164 on page 936. Compare the estimates B when 
a is respectively equal to 0, 0.25, 0.5 and 1.0. 


15.4.2 Cross-validation of the ridge linear regression 
We consider the ridge estimator: 
r 1 À 
Ê = argmin 5 (Y — xp)" (Y - X8) + Gn 
where Y is a n x 1 vector, X isa n x K matrix and 8 isa K x 1 vector. 
1. Compute the ridge estimator B. 
2. We note B_; the ridge estimator when leaving out the it? observation: 
A l T ÀA -T 
Bi = arg mIn 3 (Y; E X8) (Y_; = X_;() + 38 B 


where Y_; and X_; correspond to Y and X with the it! row removed. By using the 
relationships X! X = X!,X_;+a,2) and X'Y = X!,Y_; + ziyi, show that: 


(X'X + Mix) Lit; 
t= ki 


> 
II 
W 


where @; = yi — £] B and hi = x] (X'X+ Mx)" Ti. 


3. We note Ji = at} ĝi and t;,-; = yi — Îi, —i.- Demonstrate that: 
Oy 


1—h; 


x 
Ui —i = 


4. Calculate the predicted residual error sum of squares (PRESS) statistic: 
ress = — (yi — Îi,—:) 
i=l 
where ĝi: —; is the estimate of y; based on the ridge model when leaving out the it? 
observation. 


> 
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5. In the OLS regression, we reiterate that df°"°¢) = traceH = K where H is the 
hat matrix for the OLS regression. Define the corresponding hat matrix H (A) for the 
ridge regression. Show that: 


where (s1,..., sg) are the singular values of X. 
6. The generalized cross-validation (GCV) statistic is defined by: 


K > =2 
GCV = nK? (>: wah) RSS (3 o) 


=a st +à 


where h = n~! S77", H(A), ; and RSS (3 o) is the residual sum of squares calculated 


with the ridge estimator 6 (A). What is the relationship between GCV and PRESS 
statistics? What is the impact of A? 


7. Show that another expression of the GCV statistic is: 


K 


cov =n (n- K+ go) "s (8o) 


8. Using the data of Example 165 on page 940, calculate the estimates ĝi when A is 
equal to 3.0. Compute also ĝi, —i, ûi—i, û; and hi. Deduce then the value of PRESS 
and GCV statistics. 


15.4.3 K-means and the Lloyd’s algorithm 


1. We consider n observations with K attributes zx; (i = 1,...,n and k = 1,..., K). 
We note x; the K x 1 vector (%;1,...,2i,«K). Show that: 


LSS Ike - zj? = =n) le =a] 


t=1 7=1 


where: 
n 
= 1 
r= — y Ti 
ns 
i=l 


2. We recall that the loss function of the K-means clustering method is: 


OILE D teal? 


j=1 C(i)=j C(i')=j 
Deduce that: 


O= n; X l le: — 2,||? 


J=1 c= 


where x; and nj are two quantities to define. 
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3. We consider the following optimization function: 
no 
: 2 
(ut... uio} =aremin S n YO les- nyl 
j=1 C(i)=j 
Show that u} = 2;. Comment on this result. 


4. Apply the K-means analysis to Example 169 and compare the results with those 
obtained with the discriminant analysis. 


15.4.4 Derivation of the principal component analysis 


The following exercise is taken from Chapters 1 and 2 of Jolliffe (2002). Let X bea K x1 
random vector, whose covariance matrix is equal to ©. We consider the linear transform 
4, = Bi X where 6; is a K x 1 vector. 


1. Calculate var (Z1) and define the PCA objective function to estimate 61. Show that 
81 is the eigenvector associated to the largest eigenvalue of X. 


2. Calculate var (Z2) and cov (Z1, Z2). Define then the PCA objective function to es- 
timate 62. Show that ə is the eigenvector associated to the second eigenvalue of 
x. 


15.4.5 Two-class separation maximization 
We note x; the K x 1 vector of exogenous variables X for the itt observation. 
1. We consider the case of J classes. We note ji; the mean vector for class Cj: 
Z 1 
ae > Ti 
J iEC; 


and fi the mean vector for the entire sample: 


Calculate the scatter matrices S, Sw and Spg. Show that: 
S = Sw + Sg 


2. We now consider the two-class problem, and we note y; = Bl ay. Show that: 


T nına m KD 
Sef = — 
B SBB EEF (ñ — fiz) 
where: 1 
pj = — 5 Yi 
i€Cj 
3. Show that: 
B'Swb = 531 +53 
where: 


= X (vi -y 


iECj 
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4. Deduce that the Fisher optimization program is: 


z wd 
p = arg max (fia = fiz) 
5 + 33 
What is the interpretation of this statistical problem? 
5. Find the optimal value 6* and verify that the decision boundary is linear. 


6. Using Example 170 on page 967, calculate Sw and Sz. Find the optimal value 6* and 
compute the score for each observation. Propose an assignment decision based on the 
mid-point rule. Comment on these results. 


15.4.6 Maximum likelihood estimation of the probit model 


1. Given a sample {(x;, yi), i =1,...,n}, find the log-likelihood function of the probit 
model. 


2. Let J (8) be the Jacobian matrix of the log-likelihood vector. Show that: 


_ (i —2 (21 8)) e(z) 
GODET 


for i= 1,...,n and k = 1,..., K. Define the score vector S (8). 


3. Let H (8) be the Hessian matrix of the log-likelihood function. Show that: 


where: 
H; = MU p) t3; pole BD) § (xT) 
® (x B) 
a-p PTA) = 278 (L= EA 4 ry 


(1-®(278))° 


4. Propose a Newton-Raphson algorithm to find the ML estimate. 


15.4.7 Computation of feed-forward neural networks 


We consider the canonical neural network without constant and direct link. 


1. We note X the input matrix of dimension nxn, and Y the output matrix of dimension 
n X Ny. Let Y be the prediction of Y. Find the matrix relationship between X and Y 
with respect to the parameter matrices 6 and y of dimension n, x Nng and ny X nz. 


2. We assume that the activation functions fr, and f,, are the identity function. 
Demonstrate that the neural network is equivalent to an overidentified linear model 
or a constrained linear regression. 
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3. We consider the additive loss function: 


where: 
Lij (0) = € (yy (xi) , Yiz) 


Calculate the matrices 0,£ (0) and O3£ (0) of dimension ny x n, and nz X Ng. 


4. We assume that the activation functions fr,- and fz y correspond to the logistic func- 
tion and the loss is the least squares error function. Find the matrices 0,£ (0) and 
OpL (0). 


5. Same question if we consider the cross-entropy error loss: 
L(0) =- X (ys my (ai) + (1 — y:i) In (1 — y (z:))) 


6. Explain why we cannot use the property of additivity in the case of the softmax 
function. 


7. Calculate the matrices 0,£(@) and 0g£ (0) when fz y is the softmax function, frz is 
the identity function, and the loss function is the multi-class error function: 


n ng 


L(0)=-X X vij ny; (z) 


i=1 j=1 


where nc is the number of classes”. 


8. Extend the previous results when we consider a constant between the x’s and the z’s, 
a constant between the z’s and the y’s and a direct link between the «’s and the y’s. 


15.4.8 Primal and dual problems of support vector machines 


The goal of this exercise is to determine the primal and dual problems of the differ- 
ent SVM models. For each problem, we ask to write the primal problem into a quadratic 
programming (QP) format: 


1 
6 = argmin 50'Q0—0'R 


Aĵ = B 
s.t. c0 > D 
0 <0<0t 


where @ is the vector of parameters. Then, we ask to find the corresponding dual problem 
and also the associated QP matrix form. 


T9Hint: Use the following decomposition £ (0) = i Li (0). 
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Hard margin classification 


We first begin with the hard margin classifier. We recall that the primal optimization 
problem is: 


2 2 1 
{30,84 = argmin 5 |) 
s.t. yi (Bo +2) 8) >1 fori =1,...,n 


1. By noting 0 the vector of parameters, write the primal problem in the QP form. 


2. We note a = (a1,...,Q,) the vector of Lagrange coefficients associated to the con- 
straints yi (Bo + zi B) > 1. Write the Lagrange function and find the first-order con- 
ditions. 


3. Deduce that the dual problem is: 


n 1 n n 

A = Ta 

Q = arg max a= 2 OG AGYiY 2; Tij 
i=1 


i=1 j=1 
st. a>0 
4. Write this dual problem as a QP problem. 
5. Determine the dual QP problem directly by applying Equation (A.12) on page 1047. 
What do you observe? How to fix this issue? 
Soft margin classification with binary hinge loss 
We now consider the soft margin classification problem: 


{Aob E) = argmin5 R+S 


i=1 


yi (Bo +æ} 8) > 1—-& P 
s.t. ae fori=1,...,n 
1. Write the primal problem as a QP problem. 


2. Show that the objective function of the dual problem does not change compared to 
the hard margin classifier. What does the dual QP problem become? 


3. How can we characterize the support vectors? 
4. Find the optimal values of &;. 


5. We consider the training data set given in Table 15.18 on page 989. Represent the 
optimal values of Bo, 81, 82, X ;—; & and the margin M with respect to C. Compare 
the optimal hyperplane when C = 0.07 with the optimal hyperplane obtained with 
the hard margin classifier. 


Soft margin classification with squared hinge loss 
We replace the binary hinge loss by the squared hinge loss: 


{Ao bE) = agmini +E 


i=1 


ae Te 
B & 20 


fori=1,...,n 
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1. Write the primal problem as a QP problem. 
2. Find the dual problem. What do you observe? 


3. We consider the training data set given in Table 15.18 on page 989. Study the con- 
vergence of the optimal values of 80, 61, 82, X ;—;ı & and the margin M with respect 
to C. What is the main difference between binary and squared hinge loss functions? 


4. We introduce in the training set two new points (6.0, 5.0, +1) (¢ = 16) and (2.0, 2.0, —1) 
(i = 17). Calculate o, 6, &; and é; when the constant C is equal to 1. 


Soft margin classification with ramp loss 


1. Compare 0 — 1, binary hinge, squared hinge and ramp loss functions. 


2. Using the property min(1,max(0,a)) = max(0,a) — max(0,a—1), show that 
LP (xi, yi) is the difference of two convex functions. Comment on this result. 


LS-SVM regression 


We consider the following optimization problem: 
ee 1 “ 
{80,8,€} = agmine e 
i=1 
st. y= Bot+2{)B+& fori=1,...,n 
1. Write the primal problem as a QP problem. 


2. Find the dual QP problem. 


3. Deduce the expression of 8) and £. Show that the residuals are centered. 


e-SVM regression 


We consider the following optimization problem: 


{Aob EE) = argmin5 Iba +O >> (Er +E) 
i= 


Bo+al B-y <e+& 
yi — Bo -2pBSe+ ef 


fori=1,...,n 


where € > 0. 
1. Write the primal problem as a QP problem. 
2. Find the dual problem. 
3. Write the dual problem as a QP problem. 
4. Deduce the expression of Bo and B. 
5. Calculate the optimal values €~ and Ê+. 


6. What does the optimization problem becomes when € = 0? 


Credit Scoring Models 1029 


15.4.9 Derivation of the AdaBoost algorithm as the solution of the ad- 
ditive logit model 


We consider a special case of additive models, where the loss function is specified as 


follows: i 


L (As), f) = X £ (yi, Gs—1) (wi) + Bis) fes) (2:)) 
i=1 


Gi (£) = ey Bist) Fst) (x), f(s) is the s*” optimal classification model and £ (y, f (x)) = 
e vf (2) 


1. Show that: 
L (Bis), 7) =m, se — Yi bes) fce) (21) 


where w; s is a quantity to determine. 
2. Find an expression of £ (Bis); fis) that depends on the error rate: 


ee Wis l {yi F Yio} 


Lis) = Serie 


where yis = f(s) (£1). 


3. We assume that f(s) is known. Verify that the optimal value of Bs): 


Bis) = argmin £ (bis), fes) ) 


X 1 1— Lis) 
= -1 
Bis) 2 n ( Lis) ) 
4. Suppose that fis) has been already estimated. Show that the normalized observation 
weights are: 


is equal to: 


Wi pews Hyi pis} 


ae wy ewe Huut} 


Wi,st1 = 


where ws is a parameter to determine. 


5. Conclude on these results. 


15.4.10 Weighted estimation 


We note w = (wi,...,Wn) the vector of observation weights. 


1. We consider the weighted log-likelihood function: 


(a) Define the weighted maximum likelihood estimator. 


(b) Find the expression of the Jacobian and Hessian matrices. 


2. We consider neural networks (Exercise 15.4.7 on page 1025). 
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(a) Define the least squares loss function Lu (0). Give the matrix form of the deriva- 
tives 0,Ly (0) and O3Ly, (0). 


(b) Same question if we consider the cross-entropy loss function. 
3. We consider the soft margin SVM classification (Exercise 15.4.8 on page 1027). 


(a) Define the optimization problem. 
(b) What is the impact of introducing weights on the primal and dual problems. 


(c) Why weighted hard margin classification does not make sense? 


Conclusion 


In the past forty years, risk management has considerably changed in the banking industry 
and more broadly in the financial sector. There are certainly two main reasons. The first one 
is due to the development of financial markets, the innovation of financial solutions and the 
competitiveness between financial agents. Since products and operations are more complex 
than in the past, it is quite normal that risk management has followed the trend. However, 
this first reason explains partially the development of risk management. The second reason is 
that carrying on banking and financial business requires a strong risk management, because 
risk pricing has become the essential element to ensure the sustainability of the financial 
institution. On top of that, regulation has perfectly understood the role of the banking 
sector on the economy, which is positive for boosting the economic growth but may also be 
a problem of systemic risk. In particular, the 2008 Global Financial Crisis has completely 
changed the approach of regulators, and the place of risk management in the financial sector. 
Before 2008, risk management was a tool for banks for managing their own business risk. 
Since 2008, risk management has become a tool for regulators and supervisors for managing 
the systemic risk of the whole financial sector. This is particularly true for banks, but this 
phenomenon is now expanding to other financial institutions such as insurance companies 
and asset managers. 


This handbook reflects the evolution of risk management of these last forty years. Besides 
the presentation of statistical and mathematical tools that are necessary for measuring and 
managing financial risks, it gives the guidelines of the banking and financial regulation 
and introduces the different methods for computing capital requirements. This handbook 
illustrates that there is no other industry in the world, where the regulation is so complex 
and strong. It also illustrates that financial risk management is highly mathematical and 
technical. The combination of these two dimensions makes the practice of risk management 
a difficult exercise. In this handbook, we use many examples, provide many illustrations and 
propose many exercises in order for the student to gain a strong knowledge in the practice 
of risk measurement. Measuring the risk is the first step before managing it. Therefore, 
this handbook does not claim to give recipes in order to take the right risk management 
decisions. This skill will develop with work experience and real life situations. But this 
handbook claims that the student has the essential background of risk measurement and 
regulatory rules to become a risk manager. 
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Appendix A 


Technical Appendix 


A.1 Numerical analysis 


A.1.1 Linear algebra 


Following Horn and Johnson (2012), we recall some definitions about matrices!: 
e the square matrix A is symmetric if it is equal to its transpose A! ; 


e the square matrix A is hermitian if it is equal to its own conjugate transpose A*, 
implying that we have A; į; = conj A; i; 


e we say that A is an orthogonal matrix if we have AAT = A! A = I and an unitary 
matrix if we have A* = Aq}; 


e A* is the Moore-Penrose inverse or pseudo-inverse of A if AAt A = A, AT AAt = AT 
and, AAT and At A are hermitian matrices; in the case where A is invertible, we have 
A*t = A~!; when A has linearly independent columns, we have At = (ATA) Al. 
A.1.1.1 Eigendecomposition 


The value A is an eigenvalue of the n x n matrix A if there exists a non-zero eigenvector 
v such that we have Av = Av. Let V denote the matrix composed of the n eigenvectors. We 
have: 
AV=VA 


where A = diag (à1,..., An) is the diagonal matrix of eigenvalues. We finally obtain the 
eigendecomposition of the matrix A: 


A=VAV"! (A.1) 
If A is an hermitian matrix, then the matrix V of eigenvectors is unitary. It follows that: 
A=VAV* 
In particular, if A is a symmetric real matrix, we obtain’: 


A=VAV' (A.2) 


!To go further, the reader may consult the book of Meucci (2005), which contains an extensive presen- 
tation of linear algebra tools used in risk management. 
2We have: 


Al 


(vav-1)' 
= (v-)'avT 


We deduce that V~! = VT. 
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Remark 204 A related decomposition is the singular value decomposition. Let A be a rect- 
angular matriz with dimension m x n. We have: 


A=USV* (A.3) 


where U is am xm unitary matriz, 4 is am x n diagonal matrix with elements o; > 0 
and V is a n x n unitary matrix. o; are the singular values of A, u; are the left singular 
vectors of A, and v; are the right singular vectors of A. 


A.1.1.2 Generalized eigendecomposition 


The generalized eigenvalue problem is Av = ABv where A and B are two n x n matrices. 
In a matrix form, we have: 


AV = BVA 


where A = diag (à1,..., An) is the diagonal matrix of ordered generalized eigenvalues. The 
generalized eigenvalue problem is related to the maximum/minimum of the Rayleigh quo- 
tient: = 

x’ Ax 

x! Ba 
Indeed, we get x* = vı and R(a*) = A; for the maximization problem x = arg max R (x) 
and a* = vn and R(«x*) = An for the minimization problem x = arg min R (x). 


A.1.1.3 Schur decomposition 


The Schur decomposition of the n x n matrix A is equal to: 
A= QTQ* (A.4) 


where Q is a unitary matrix and T is an upper triangular matrix. This decomposition is 
useful to calculate matrix functions. 


Let us consider the matrix function in the space M of square matrices: 


f: M—>M 
A= B= f(A) 


For instance, if f (x) = yz and A is positive, we can define the matrix B such that: 
BB? =B*B=A 


B is called the square root of A and we note B = A’/?. This matrix function generalizes the 
scalar-valued function to the set of matrices. Let us consider the following Taylor expansion: 


(£ — x0)? 


J] y” (xo) Aries 


f (x) = f (xo) + (2 — zo) f’ (xo) + 
We can show that if the series converge for |x — xo| < a, then the matrix f (A) defined by 
the following expression: 


(A= toI)? 


f(A) = f (wo) + (A — aol) f' (z0) + = 


f” (xo) AP ines 


3Q and T are also called the transformation matrix and the Schur form of A. 
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converges to the matrix B if |A — zoľ| < a and we note B = f(A). In the case of the 
exponential function, we have: 


We deduce that the exponential of the matrix A is equal to: 


k! 
k=0 


B=ac= 


In a similar way, the logarithm of A is the matrix B such that e? = A and we note B = In A. 

Let A and B be two n x n square matrices. Using the Taylor expansion, Golub and 
Van Loan (2013) showed that f (AT) = f(A)’, Af(A) = f(A) A and f (B-1AB) = 
Bo'f (A) B. It follows that: 


and: i 
eB AB = B'e4B 


If AB = BA, we can also prove that Ae? = e? A and e4+® = ete? = eB eA. 


Remark 205 There are different ways to compute numerically f (A). For transcendental 
functions, we have: 


f (A) = OF (T) Q" 
where A = QTQ* is the Schur decomposition of A. Because T is an upper diagonal matriz, 


f(T) is also a diagonal matrix whose elements can be calculated with Algorithm 9.1.1 of 
Golub and Van Loan (2013). This algorithm is reproduced below’. 


A.1.2 Approximation methods 
A.1.2.1 Spline functions 


We consider a set of data points (2;, yi) where 41 < £2 < +++ < £n. S (x) is the associated 
cubic spline if S (x) is a C? function, S (x;) = y; and S (x) is a polynomial of degree 3 on 
each interval: 

S(x) =a; + biz + cix? + dix? if x € [zi £i+1] 


The C? property implies that: 


mirth + ciir; + diir} = a; t bizi t cir? + dir? 
bi—1 + 2ci—1£i + 3di—1x? = bi +2cixzi + 3d,x? 
2c;-1 + 6d;_12; = 2c; + 3dixzi 


Therefore, we obtain a linear system of 4n equations with 4n unknowns. By assuming that 
Co do Cn dn 0, the linear system is tridiagonal and easy to solve. The main 
interest of cubic splines is its tractability, because it is straightforward to calculate the 
quantities S (x), S’ (x), S” (x), S71 (x) and Sas S (u) du for any value x. This explains that 
it is extensively used in finance. 


4For the exponential matrix, we may prefer to use the Pade approximation method, which is described 
in Algorithm 9.3.1 (scaling and squaring) of Golub and Van Loan (2013). See also the survey of Moler and 
Van Loan (2003). 

5This is equivalent to impose that the cubic spline is linear if x < xı and gz > £r. 
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Algorithm 3 Schur-Parlett matrix function f (A) 
Compute the Schur decomposition A = QTQ* 
Initialize F to the matrix Onxn 
for i = 1 : n do 

Jia — F (tii) 
end for 
for p= 1 :n— 1 do 
for i = 1 : n — p do 
j}i+p 
s & tij (fig — fia) 
for k=i+1:j—1 do 
s | S+ tirfkj— firtk,j 
end for 
fij — 8/ (tjj — tii) 
end for 
end for 
B + QFQ* 
return B 


Source: Golub and Van Loan (2013), page 519. 


Remark 206 The interpolation method can be extended to the smoothing problem: 


T 


minp- X m- S) +p): f " $" (u)? du 


Tı 


where p is the smoothing parameter. We obtain the cubic spline solution when p is equal to 
1, whereas we obtain the least squares solution when p is equal to 0. In the general case, the 
first-order condition consists in solving a band linear system. 


A.1.2.2 Positive definite matrix approximation 


The computation of Gaussian risk measures involves the use of covariance or correlation 
matrices. Since we can manipulate many instruments and securities, we generally observe 
missing values in the dataset. Therefore, several approaches can be used to estimate the 
covariance matrix ©. The two most popular approaches are listwise and pairwise methods. 
Listwise deletion removes all the observations that have one or more missing values. Since 
this approach is popular, it cannot be implemented from a practical point of view. For 
instance, deleting all the public holidays dramatically reduces the number of valid dates in 
a global universe of stocks. This is why pairwise deletion is used in practice. It consists of 
deleting the observations by considering each pair of observations. However, the estimated 
covariance matrix Ñ is generally not positive definite. Another issue occurs when the number 
of observations is lower than the number of variables. In this case, È is only positive semi- 
definite. 


Computing the nearest covariance matrix We assume that X is not a positive semi- 
definite matrix. We consider the square root decomposition = A? where A = A, + iA. 
We have A? = A? — A? because Aj Ap = 0 (Horn and Johnson, 2012). We deduce that the 
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eigenvalues A; (£) of X are related to the eigenvalues of A; and Ag: 


dee a |: XM) af og So 
Ài @)={ —)? (A2) otherwise 


Therefore, © can be approximated by È = A?, which is a positive semi-definite matrix. 
Moreover, we have: 


r'r T (Apt (As) £ 
TA? —a! Abe 


< g! Ate 


8 


8 


This means that any quadratic form x' Da is bounded by x‘ Sz. This means that ¥ is a 
conservative estimator when computing the Gaussian value-at-risk. 


Remark 207 We can transform any positive semi-definite matrix È into a (strict) positive 
definite matrix > by considering the eigenvalue thresholding method >» = VÄVT where 
X= VAV!, Aji = max (Aji,€) and £ > 0 is a small number. 


Computing the nearest correlation matrix Given an arbitrary symmetric matrix A, 
the nearest correlation matrix is defined as follows 


p(A) = min {||A — X||, : X is a correlation matrix} 


For solving this problem, Higham (2002) proposed to use the method of alternating projec- 
tions, which consists of iterating A + Py (Ps (A)) where Py and Ps are the projections on 
the sets S={X = X! : X >0} and U = {X =X! : diag (X) =1,}. There are different 
approaches to achieve the convergence. Higham (2002) considered the Dykstra’s method 
given in Algorithm 4. For the projections, we have Ps (R) = QT*+Q* where QTQ* is the 
Schur decomposition of R and TY = max (T; j,0), and Py (Y) = X where Xi; = 1 and 


Algorithm 4 Computing the nearest correlation matrix 


The goal is to compute p (A) 
We set A So = 0 and Xo = A 
We note £ the convergence criterion of the algorithm 
repeat 
Rk Xp_1 — ASk—1 
Yk = Ps (Rx) 
AS, = Yp — Rk 
Xk = Pu (Yr) 
until | Xk = Xr-ıllə <e 
return p(A) + Xz 


A.1.2.3 Numerical integration 


Trapezoidal and Simpson’s rules The general approach to calculate I (a,b) = 


f? f (x£) dz is to approximate the integral by a sum I (a,b) = Xi wef (x;). Leta; = a+i-h 
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where h = (b — a) /n and n is the number of knots. We have: 


b n 
[tea w DGE + fea) 


i=l 


1 = 1 
h (3 (a) + > f (ai) + 540) 


There is no difficulty to implement the trapezoidal method. Moreover, we can show that 
the quadrature error is: 


b > h2 
J fae -2(a8) =-5 -aO 


where c € [a,b]. The error decreases with the discretization step h and depends on the 
curvature f”. If the curvature is high, it would be better to use the Simpson’s rule. This 
method consists in replacing the function f (x) on the interval [x;_1, 7,41] by the parabolic 
function that matches the points f (a;-1), f (ai) and f (xi+ı). For that, we estimate the 
curvature by the finite difference: 


F (zi-1)— 2f (ws) + f (zi+1) 
h2 


f” (wi) © 
We obtain: ae 7 
JO FO dex $ G ea) +4 (ea) Fe) 


The Simpson’s rule is then: 


b 
[fea = EW) + 4s +2f (ea) HAF aa) +470) 
h n/2-1 n/2-1 
= 3|f@t+4 3 f (wai-1) + 2 > f (wai) + f (b) 


In this case, the quadrature error becomes: 


4 


P a h 
J £@) ds- ilan = -ig -OO 


Gaussian Quadratures One of the most popular methods of numerical integration is the 
quadrature method with irregular steps when we approximate the function by a polynomial. 
In the case of Gaussian quadratures, Golub and Welsch (1969) showed that, if f(a) = 
B (x) P(x) where P € Pon—1 and P, is the set of polynomials of order n, then there exists 
a set of knots 0 < £1 < £2 < -+ < £n < 1 such that®: 


roi -f f(a) de= 0" wif (æ) 


Tf the support is [a,b], we use the change of variable y = (æ — a) / (b — a): 


b 1 1 
frees J (a+ (b— a)y) ay 
a Bi (0) 
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where w; are positive weights. If the function f(x) is not a polynomial but sufficiently 
regular with respect to P(x), G (f) = X; wif (x) is an approximation of the integral 
I(0,1). To compute the weights and knots, we have to specify the basis function B (x) and 
the support. (wi, xi) is then the eigenvalue solution of a Jacobi matrix. For example, Figure 
A.1 shows (w;, 7;) in the case of Gauss-Legendre quadratures, which are used with functions 
with a finite support and B(x) = 1. An important point is that extension to dimension 
larger than one is straightforward (Davis and Rabinowitz, 1984). 


n=7 p n = 20 
I 
0.5 L16 
0.4 x 
459 
0.3 
0.8 
0.2 
= = 
e | ý | 
0.0 ool 
-1.00 -0.50 0.00 0.50 1.00 —1.00 -0.50 0.00 0.50 1.00 
Xi Xi 
po n = 50 = n = 300 
[j J 
40.8 41.2 
Ray Ray 
0.6 0.9 
0.4 0.6 
= o2 Fo3 
0.0 0.0 
=F W -0.50 0.00 0.5 1.00 -0.50 0.00 0.50 1.00 


Xi Xi 
FIGURE A.1: Weights and knots of the Gauss-Legendre quadrature 


We consider the eas f (a) = 2rw “ (Q7wx) ie In Figure A.2, we represent f (x), 

the analytical value x?+sin (27wa) of I(0,x) = fo f o ) dt and the numerical approximation 

I (0,2) when w = 1 and w = 8. We notice that 7 a depends on the order 
n of the quadrature and the upper bound x of the integral. For a fixed value n, the error 
generally increases with x. In order to understand the accuracy of the numerical solution, 
we must verify that f (x) is sufficiently regular with respect to the polynomial P € Pən—1. 
In Figure A.3, we observe that the adjustment of f (x) for x € [0,10] is bad when n = 10 
and n = 16, but it is better when n = 36 and n = 200. 

A difficulty concerns functions, whose support is not finite. In Table A.1, we report 
the value zx, of the last knot for Gauss-Laguerre and Gauss-Hermite quadratures. The 
use of these methods implies that the approximations [yS f(x) dx ~ fg” f(x) dx and 
JZ f (£) de x JES, f (x) dx are valid. 


Remark 208 We can show that the knots corresponds to the roots of the polynomial’. Once 
the roots are determined, we can calculate the weights with the following condition (Stoer 


"For example, the Legendre polynomial is defined as: 
lym? 
is n n—i n 
Pa (x)= 52 0 (") e- e+) 


i=0 
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FIGURE A.3: Legendre approximation of f (x) = 27 cos (27a) + 2x 
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TABLE A.1: Value z,, of the last knot in Gauss-Laguerre and Gauss-Hermite quadratures 


n | Laguerre Hermite 
4 9.3951 1.6507 

8 | 22.8631 2.9306 
16 | 51.7012 4.6887 
32 | 111.7514 7.1258 
100 | 374.9841 13.4065 
200 | 767.8000 19.3300 


and Bulirsch, 1993): 
b 
I w (x) zë P, (x) dz = 0 


for allk =0,1,...,n—1. The weights are then the solution of a linear system. Abramowitz 
and Stegun (1970) have tabulated the solution (wi, xi) for the most known quadratures (Leg- 
endre, Laguerre, Hermite, etc.) and different values of n. These methods are now standard 
and widely implemented in numerical softwares. 


Quadratures methods can be extended when we consider functions with several variables: 
f [ f(x,y) dady ~ SS aa Li, Zj) 
j=] jg=1 


We can also consider non-constant bounds: 


g(x 
ffs f(x,y) da dy 
gı(z 


and we have: 


re oul Sy [91 (#:).92(@ lg (ale? | lo Dao 


i=1 j=l 


where (w, 2) indicates the weights and knots of the Gauss-Legendre quadrature associ- 


a? t 


ated to the support A. 


A.1.2.4 Finite difference methods 
We follow Kurpiel and Roncalli (2000) and consider the linear parabolic equation: 


Pre) olt auta) = Auta) +d (t,2) (A-5) 
where A is the elliptic differential equation: 
E o? u (t,x) ðu (t,x) 


The goal is to solve Equation (A.5) for t € [t~,t*] and x € [£z7,x*]. In this case, we use 
the finite difference method, well-adapted for 2-order parabolic equations in x. For that, 


8Generally, n takes the values 2, 4, 8, 16, 32, 64 and 128. 
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we introduce a uniform finite-difference mesh for t and x. Let N; and N, be the number of 
discretization points for t and x respectively. We denote by k and h the mesh spacings. We 
have: 


tt- 
k = : 
Ni-1 
ES at — a7 
N,-1 
and: 
tm = t +m-k 
zti = £ +i-h 


Let u? be the approximate solution to (A.5) at the grid point (tm, xi) and u (tm, zi) the 
exact solution of the partial differential equation at this point. 


Discretization in space If we consider the central difference method to approximate the 
derivatives, we have: 
ðu (t,x) o Uii Uie 
ðs ~ 2h 


and: 
Oult,c) upa — 2u + uga 
Ox? h2 
Equation (A.5) becomes: 
Ou (t, x) 
Mym — AM 4 gm 
Ot T t u t T t 
where: E out 4 ul m A 
um, — Qui ul un, — ut 
A™ — ym i+1 i i—1 pm i+1 i—1 
aes h2 toi an 
We finally obtain: 
ðu(t 
Ot 


where: 
BY = A? +d — cfu” 
Discretization in time The most classical method to solve Equation (A.5) is to use the 
Euler scheme. We have: 
du(te) uf — Mame 
ðt e. k 
We also notice that Equation (A.5) becomes: 


+ cul = Au (t,x) + de 
However, the function Aru (t, x) depends both on time t and space x. That’s why we could 
not employ the traditional Euler algorithm: 

uP = uP! + k (Aru (t,£) + d — erur) 


In this case, we replace the function Aru (t, x) by its numerical approximation A”. There- 
fore, we have: 


ul t k (AF + dj" — cul") 
ue) + kB? 
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The 6-scheme method In the previous paragraph, we have used the single-sided forward 
difference to approximate the derivatives 0,u (t,x). The 6-scheme method is a combination 
of left-sided and right-sided differences. Let 0 € [0,1]. We have: 


ur =u"! +k((1— 06) BY! + 6B") 
Using the expression of B?”, we obtain: 
m— m— k m— k 
ust (ara) or) =) 
k 
tum! (1 — 2" (1-6) = — ce 1 (1-8) r) 
m— m— k m— k 
tun" (o a’ "(1 = 6) E 1 -05 ) 


k 
+u™, (« P55 - mos 


k 
bun” ( 1 2a;"0 75 cy Ok 
m m k m 
Fuit a; O75 + 8 OF 


where: 


ym =d"*(1-0)k+d™ok 


The different numerical algorithms We introduce the following notations: 


Re ae Pe a, 
k 
Br = 1-2a” pa k 
E E ate 
Vi = ü 73 i 2h 


The explicit scheme corresponds to 0 = 0. We have then: 


um =o tae + Be tue a tu +d lk (A.7) 


We obtain the numerical solution by iterating Equation (A.7) from the initial condition and 
using Dirichlet conditions. The implicit scheme corresponds to 0 = 1. We have then: 


apura + (BP — 2) uy + Yura = (up + dk) (A.8) 
We obtain the numerical solution by solving the linear system (A.8) and using Neumann 


conditions. In mixed schemes, we have 6 € ]0,1[. In particular, we can show that the 
algorithm is stable if 0 > L, In the general case, the stability assumption is verified if: 


k> oN AA 
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For instance, the well-famous Crank-Nicholson scheme corresponds to 6 = 5. By introducing 
the following notations: 


gy = (1-@)ay 

r = 14+ (1-8) (8-1) 
v = (1-8) 

by = bar 

gy = —14+0(6"—-1) 

x = OY” 

yr = (1-0)d'k+6d"k 


The linear system to solve becomes: 


Pup toru + xual = — (Pay + ea tu! + oP) 
The corresponding matrix form is: 
AmUm = — (Sm—1Um-1 + Win) mor (A.9) 


where: 


UN, —3 
UN, —2 
The Am and Zm matrices are defined in the following manner: 
gr xr 0 
os p? xg" 0 


0 ON,-2 YN,—2 


and: 
TI” vt 0 
S3 T vy 0 
i i 


m mM 
O SN, -2 TN,-2 
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whereas €m is the residual absorbtion vector: 


— (pru 


m m 
= (xn, ott, 4 TU 


Integrating the boundary conditions 


+67 tug") 


0 


0 


m—-1 ,m-1 
Nz—-2UN,-1 
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A new form of the system of equations (A.9) is: 


AmUm = Vm + Em 


where: 


Vn = — (Em—-1Um—1 F Um) 


The use of boundary conditions (Dirichlet or/and Neumann) leads us to modify this equa- 


* _ yk 
A> Um = Vi, 


tion: 
where: 
An 
Vin 
(vin) 1 
(Vind Nai 


e Conditions on x7 


— Dirichlet: u (t, £7) = uz- (t) 


e Conditions on zt 


— Dirichlet: u (t, £") = uz+ (t) 


(v3 


— Neumann: 0, u (t, £t) = ul, (t) 


Am 
Vm 

m—1, m-1 
=Si Uo 


ve ee To =XN,—2Us+ (tm) 


(An) Na —2,N,—2 ~ XN, -2 


(vs 


m)Na—2 m XN, —2Ug+ (tm) h 
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A.1.3 Numerical optimization 
A.1.3.1 Quadratic programming problem 


A quadratic programming (QP) problem is an optimization problem with a quadratic 
objective function and linear inequality constraints: 


1 
x* = argmin zr Qe —a'R 


st. Sx<T (A.10) 


where x is a n x 1 vector, Q is an x n matrix and R is a n x 1 vector. We note that 
the system of constraints Sx < T allows specifying linear equality constraints? Az = B 
or weight constraints x7 < x < xt. Most numerical packages then consider the following 
formulation: 


1 
x* = argmin z7 Qa —a'R 


Ar =B 
st. l Ce<D (A.11) 
a <a<at 


because the problem (A.11) is equivalent to the canonical problem (A.10) with the following 
system of linear inequalities: 


—A -B 

A B 

C TS D 

—In =T 

Tn at 
If the space defined by Sa < T is non-empty and if Q is a symmetric positive definite 
matrix, the solution exists because the function f(x) = z7 | Qe — x! R is convex. In the 


general case where Q is a square matrix, the solution may not exist. 
The Lagrange function is also: 


L(x; A) = 527 Qa-a™R+X" (Sz —T) 
We deduce that the dual problem is defined by: 
àÀ* = argmax {int c (x; »)} 
st. A>O 


We note that 0, L (x; A) = Qx — R + S! à. The solution to the problem 0, £ (x; A) = 0 is 
then z = Q7! (R— S'A). We obtain: 


inf L(z;à) = = (RT -ASQ (R S'A) —(R'-ATS)QUR+ 


dT (SQ (R-S")) -T) 

= SRTQUR- AT SQR + =l SQTISTA-— R'QIR+ 
2A! SQ 'R-A SQTISTA—A'T 

= = ATSOA +1 (SQ7'R-T) - SRTQUR 


This is equivalent to imposing that Ax > B and Az < B. 
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The dual program is another quadratic program: 
Pe _ 
* = argmin zA QA -A'R (A.12) 
s.t. ALSO 
where Q = SQ-'ST and R= SQ-!R—-T. 
A.1.3.2 Non-linear unconstrained optimization 
We consider the minimization problem: 
x* = arg min f (x) (A.13) 


where z € R”. Let G(x) and H (x) be the gradient vector and the Hessian matrix of f (x). 
The optimum verifies: 
G(a*) =0 (A.14) 


The first-order Taylor expansion of G (a) around the point xo is given by: 
G (x) = G (xo) + H (zo) (x — zo) 


If x is the solution of Equation (A.14), we obtain G (zo)+H (xo) (a — xo) = 0. The Newton- 
Raphson algorithm uses an iterative process to find this root: 


—1 
Tk+1 = Tk — A, Gk 


where k is the iteration index, Gk = G (xp) and Hp = H (xp). Starting from an initial point 
£o, we find the solution x* if the algorithm converges!?. However, we generally prefer to use 
the following process: 


Tet = k= ArH, Gp 
= Tk +AÀkdk 
where Ax > 0 is a scalar. The difference comes from the introduction of the step length Ax. 
Starting from the point xp, the vector dk = —H, 1Gp indicates the direction to reach the 
maximum. Nevertheless, using a step length equal to 1 is not always optimal. For instance, 


we could exceed the optimum!! or the convergence may be very slow. This is why numerical 
optimization methods use two types of algorithms: 


1. an algorithm to approximate the Hessian matrix H, and to compute the descent dk; 
2. a second algorithm to define the optimal step length Ag: 
Àk = i Adk 
k = arg min f (£k + Adx) 
The Hessian approximation avoids singularity problems which are frequent in the neighbor- 


hood of the optimum. Press et al. (2007) distinguished two algorithm families to define the 
descent, namely conjugate gradient and quasi-Newton methods. 


10We stop the algorithm when the gradient is close to zero. For example, the stopping rule may be 
maxi |Gx,| < e where € is the allowed tolerance. 
1This means that f does not necessarily decrease at each iteration. 
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e In the case of the conjugate gradient approach, we have: 
dk+1 = — (Gri — Ok+1dk) 
For the Polak-Ribiere algorithm, the scalar o is given by: 


= Gg41Gk+1 
Ok+1 GI Gi 


whereas for the Fletcher-Reeves algorithm, we have: 


(Gua — Gk)! Cas 
GIG. 


Ok+1 = 


e For quasi-Newton methods, the direction is defined as follows: 
dk+1 = —HysiGes 
where H is an approximation of the inverse of the Hessian matrix. Its expression is: 


x ~ ŘĀryry Ap SESE 
Any = Hy eae - 4 
Yp Aye Sp Yk 


B (Any — Okse) (Ary — Oksr) 


where Yk = Gk+1 — Gk, Sk = k41 — Le and: 


b= yn Anyi 
ns 
Sk Yk 


The Davidon, Fletcher and Powell (DFP) algorithm corresponds to 6 = 0, whereas 
the Broyden, Fletcher, Goldfarb and Shanno (BFGS) algorithm is given by: 


_ 1 
Yn Hkyk 
To find the optimal value of Ag, we employ a simple one-dimension minimization algorithm!” 


such as the golden section, Brent’s method or the cubic spline approximation (Press et al., 
2007). 


Remark 209 Newton’s method may also be used to solve non-linear optimization problems 
with linear constraints: 


x* =  argminf (zx) 


s.t. Ax =B 
Indeed, this constrained problem is equivalent to the following unconstrained problem: 
y* = arg min g (y) 


where g(y) = f(Cy+ D), C is an orthonormal basis for the nullspace of A, D = 
(ATA)* ATB and (ATA)* 
is then: 


is the Moore-Penrose pseudo-inverse of A' A. The solution 


x =Cy*+D 


12Computing the optimal value of A, may be time consuming. In this case, we may also prefer the half 
method which consists in dividing the test value by one half each time the function fails to decrease — A 
then takes the respective values 1, 1/2, 1/4, 1/8, etc. — and to stop when the criterion f (£k + And) < f (£k) 
is satisfied. 
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A.1.3.3 Sequential quadratic programming algorithm 


The sequential quadratic programming (or SQP) algorithm solves this constrained non- 
linear programming problem: 


x = argminf (x 


A(z) = 
s.t. { ae a 


) (A.15) 
0 

0 
where A (x) and B(x) are two multidimensional non-linear functions. Like Newton’s meth- 
ods, this algorithm is an iterative process: 

£k+1 = Tk + Andy 
where: 
1 
dk = argmin z1 Hrd +d' Gk 


A aA GeO 
Sit { 9,B(a,)d-+ Bay) >0 


It consists in replacing the non-linear optimization problem by a sequence of quadratic 
programming problems (Boggs and Tolle, 1995). The QP problem corresponds to the second- 
order Taylor expansion of f (x): 


1 
f (te +6) =f (we) +8 Gk + zô Hd 


where: 


l Kee EE EE T 
B (xk +6) = B (£k) + 0,B (xk) > 0 


and 6 = Ad. We can use quasi-Newton methods to approximate the Hessian matrix Hy. 
However, if we define A, as previously: 


Ak = min f (£k + Adk) 


we may face some problems because the constraints A (x) = 0 and B (x) > 0 are not nec- 
essarily satisfied. This is why we prefer to specify A, as the solution to the one-dimensional 
minimization problem: 
Ap = min m (£k + Adg) 
d»>0 


where m (x) is the merit function: 
m(x) = (a) tpa} lá; )|- ps X` min (0, B; (x)) 
j 
We generally choose the penalization weights p4 and pp as the infinite norm of Lagrange 
coefficients associated with linear and non-linear constraints (Nocedal and Wright, 2006). 
A.1.3.4 Dynamic programming in discrete time with finite states 


We note k the discrete time where k € {1,..., K}. Let s(k) and c(k) be the state and 
control variables. We assume that the state variable evolves according to the dynamics: 


s(k+1) = q(k,s(k),c(k)) 
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The state variable s (k + 1) only depends on the previous value s (k) and not on the entire 
path of the state. However, it can be controlled thanks to c(k). Knowing the initial value 
of the state s (1) = s, we would like to find the optimal control c* (k) that maximizes the 
additive gain function: 


{c* (Wp = argmax J (s,c(1),...,c(K —1)) (A.16) 
where: 
K-1 
IJ (s,c(1),...,c(K = 1)) = f (k,s(k),c(k)) + f (K,s(K)) 
k=1 


f (k,s(k),c(k)) is the gain function at time k, whereas f (K,s(K)) is the terminal gain. 
We impose that the state and control variables satisfy some constraints: s (k) € S(k) and 
c(k) € C (k). The optimization problem becomes: 


K-1 


{c*(k)}pn = argmax J f (k,s(k),c(k)) + f (K,8(K)) (A.17) 


k=1 


A policy m = {w(1),...,(K — 1)} is described by functions u (k) = u (k, s (k)) that map 
states into controls (Bertsekas, 2005). The optimal policy 7* is then defined as follows: 


K-1 


TE = argmax X` f(k,s(k),u(k,s(k))) + f(K,s(K)) (A.18) 


RIS VAR , H (k, s (k))) 


This problem may be solved with the method of dynamic programming introduced by 
Bellman (1957). Let x € N such that 1 < k < K — 1. We consider the tail subproblem: 
K-1 
m™ (K) =argmax À f(k, s (k), u (k, s (k))) + f (K, 8 (K)) (A.19) 
k=k 
with the same set of constraints that those used for Problem (A.18) Bellman’s optimal- 
ity principle states that if m* = (u* (1),...,u* (K — 1)) is an optimal policy for Problem 
(A.18), then the tail policy 7* (k) = {u* (k) ,..., u* (K — 1)} is an optimal policy for Prob- 
lem (A.19). Therefore, we can solve Problem (A.18) using a backward algorithm, which is 
characterized by a set of recursive optimizations (Bertsekas, 2005): 


1. at the terminal date, we have: 
J (K,s(K)) = f (K,s(K)) 
2. at the intermediate date k < K, we have: 


J (k,s(k))= sup {f(k,s(k),c(k)) +7 (k+ 1,g(k,s(k),c(k)))}  (A-20) 


c(k)EC(k) 
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In the finite case where there are ng states and nc controls, the previous algorithm is 
simplified. Let {s1,...,5ng} and {c1,...,Cno} be the values taken by s(k) and c(k). We 
note J (K, si) the terminal value when s(K) = s;. We store the solutions J (k,s(k)) and 
c* (k) into the matrices J and C. The DP algorithm becomes: 


1. We initialize the algorithm by k = k. 


2. At the time K, we know the terminal value J (K,s(K)). Therefore, we initialize the 
element (i, K) of the matrix J to J (K, si). 


3. We set k — k—1. 
4. At the date k < K, we calculate for each state s; the value taken by J (k, si): 
J (k,s;)= sup {f (k,s:,c;)+ 7 (k+1,s8’)} 


1l<j<ne 


where s’ = g (k, si, cj). By construction, s’ corresponds to a state s;. We deduce that 
J (k+1,8’) is equal to the element (2’,k + 1) of the matrix J. Moreover, the optimal 
control c* (k) is given by: 


c* (k) = argmax {f (k, s;,c;) + J (kK +1,8’)} 


1l<j<ne 
We deduce that the element (i,k) of the matrices J and C are J (k, si) and c* (k). 


5. If k = 1, we stop the algorithm. Otherwise, we go to step 3. 


A.2 Statistical and probability analysis 
A.2.1 Probability distributions 
A.2.1.1 The Bernoulli distribution 


The Bernoulli random variable X takes the value 1 with success probability of p and 
the value 0 with failure probability of q = 1 — p. We note X ~ B (p). The probability mass 
function may also be expressed as follows: 


Pr{X =k}=p*(1—p)'* — withk=0,1 


We have E[X] = p and var (X) = p (1 — p). 


A.2.1.2 The binomial distribution 


The binomial random variable X is the sum of n independent Bernoulli random variables 
with the same probability of success p: 


We note X ~ B(n,p). The probability mass function is equal to: 


n 


Prix =k} = (pa py" with k =0,1,...,n 


We have E[X] = np and var (X) = np (1 — p). 
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A.2.1.3 The geometric distribution 


The geometric random variable X is the number of Bernoulli trials needed to get one 
success. We note X ~ G (p). The probability mass function is equal to: 


Pr{X =k}= -pp with k € N* 
We have E [X] = 1/p and var (X) = (1 — p) /p?. 


Remark 210 If we define X as the number of failures before the first success, we have 
Pr{X =k} = (1 — p)" p with k € N, E[X] = (1 — p) /p and var (X) = (1 — p) /p?. 


A.2.1.4 The Poisson distribution 


The Poisson random variable X is the number of times an event occurs in the unit 
interval of time. We note X ~ P (A) where A is the parameter of the Poisson distribution. 
The probability mass function is equal to: 

Me 

k! 
We have E[X] = var (X) = à. The parameter A is then the expected number of events 
occurring in the unit interval of time. 


with k € N 


Pr{X =k} = 


A.2.1.5 The negative binomial distribution 


The negative binomial distribution is another probability distribution for modeling the 
frequency of an event. We note X ~ NB (r,p) where r > 0 and p € [0,1]. The probability 
mass function is equal to: 


Pr{x =z} = ( SLEDE with k € N 


We have E [X] = pr / (1 — p) and var (X) = pr/(1—p)’. 


A.2.1.6 The gamma distribution 


The gamma distribution is a two-parameter family of continuous probability distribu- 
tions, whose support is [0, o0). We note X ~ G (a, 8) where a > 0 and 6 > 0. a and 8 
are called the shape parameter and the rate parameter. The probability density function is 
equal to: 

Beale Be 


Tia 


where T (a) is the gamma function defined as: 


Co 
T (a) = f iter dt 
0 
The cumulative distribution function is the regularized gamma function: 


_ (a, Be) 
ETI 


where y (a,x) is the lower incomplete gamma function defined as: 
T 
y (a, x) =| (ote tdt 
0 


We have E[X] = a/f and var (X) = a/8?. We verify the following properties: 
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e G(1,8) ~ E (B); 
e if X ~G (a, 8), then cX ~G (a, B/c) when c > 0; 


s Dia G (ai, B) ~G Oss ai, P). 


Remark 211 The standard gamma distribution corresponds to G(a,1) and is denoted by 
G (a). 


A.2.1.7 The beta distribution 


The beta distribution is a two-parameter family of continuous probability distributions 
defined on the interval [0,1]. We note X ~ B (a, B) where a > 0 and 8 > 0. The probability 
density function is equal to: 
g1(1— x)? 

B (a, b) 


where % (a, 8) is the gamma function defined as: 


f(z) = 


1 
B (a,b) = Peai dt 


T (a)T (b) 
T (a+ 8) 


The cumulative distribution function is the regularized incomplete beta function. 


F (x) TB (x; a, 8) 
B (x; a, 8) 


B (a, p) 


where % (2; a, 3) is the incomplete beta function defined as: 


B (x;0, b) = i t1 (1 —t)f dé 


We have E [X] = a/ (a + 8) and: 


ap 
(a+ 8)? (a+ 6+1) 


var (X) = 
A.2.1.8 The noncentral chi-squared distribution 


Let (X1,..., X ) be a set of independent Gaussian random variables such that X; ~ 
N (hi, o2). The noncentral chi-squared random variable is defined as follows: 


V X2 

Y= Z 

2e 

i=1. 4% 

We write Y ~ x2 (C) where v is the number of degrees of freedom and ¢ is the noncentrality 


parameter: 
Lape 29 
¢= > Hi 
z 2 
O- 
i=1 Í 
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The cumulative distribution function of Y is defined as: 


e/2¢5 . 
gl (y; v + 27,0) 


F (nnl) =Pr{Y <9} =>}, 


where F (y; v, 0) is the cumulative distribution function of the chi-squared distribution with 
v degrees of freedom. We deduce that the probability density function is: 


© etçi 
Fuind=d Se 


j=0 


f(y; v + 23,0) 


where f (y;v,0) is the probability density function of the chi-squared distribution. We may 
also show that the mean and the variance of Y are v + ¢ and 2(v + 2¢). For the skewness 
and excess kurtosis coefficients, we obtain: 


93 
y = (w30) (+208 
12 (v + 4¢) 

2 T wto 


Remark 212 When u; is equal to zero, Y becomes a (central) chi-squared distribution 
x? (0). The density function is equal to: 


gi /2-1e-#/2 


f (y3v,0) = PPT (rJ) 


whereas the cumulative distribution function has the following expression: 


cei a 


T (¥/2) 
A.2.1.9 The exponential distribution 


X is an exponential random variable E (A) if the density function is f (x) = Ae~*” for 
x > 0. We deduce that F (x) = 1 — e~>*. We have E[X] = 1/A and var (X) = 1/7. More 
generally, we can show that E[X”] = n!/X”. This distribution verifies the lack of memory 
property: 


Pr{X >s+t|X >s}=Pr {ix >t} 
for alls > 0 and t > 0. 


A.2.1.10 The normal distribution 


Let C be a correlation matrix. We consider the standardized Gaussian random vector 
X ~ N (0,C) of dimension n. We note @, (x; C) the associated density function defined as: 


=a = 1 
dn (x; C) = (27) a |C] ? exp (-527c2) 
We deduce that the expression of cumulative distribution function is: 


®, (x;C) = oe hn (14) dts 
ae 
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By construction, we have E[X] = 0 and cov (x) = C. In the bivariate case, we use the 
notations $2 (%1,%23p) = o2(x;C) and ə (z1, £2; P) = ə (x;C) where p = Cy is the 
correlation between the components Xı and X2. In the univariate case, we also consider the 
alternative notations ¢ (x) = ¢; (#;1) and ® (x) = ®, (x;1). The density function reduces 


then to: i i 
eee ue 2 
b(a) = za ( 5] 


Concerning the moments, we have u (X) =0, o (X) = 1, y1 (X) =0 and yv (X) =0. 
Adding a mean vector u and a covariance matrix © is equivalent to apply the linear 
transformation to X: 


Y =u+0oX 
where o = diag’/? (X). 


A.2.1.11 The Student’s ¢ distribution 


Let X ~ N (0,£) and V ~ y2/v be independent of X. We define the multivariate 
Student’s t distribution as the one corresponding to the linear transformation: 


Y =VX 
The corresponding density function is: 
Pitter z 1 —(w+n)/2 
kunas A) A Dials (1 + Ey) 
P (¥/2) (vm) K 


We note T, (y; È, v) the cumulative density function: 


yı y2 
E | -f tn (u; £, v) du 


The first two moments!* of Y are E [Y] = 0 and cov (Y) = v (v — 2)7} Ð. Adding a mean j 
is equivalent to consider the random vector Z = u + Y. We also verify that Y tends to the 
Gaussian random vector X when the number of degrees of freedom tends to oo. 


In the univariate case, the standardized density function becomes: 


T ((v+0/2 2y ae 
ti (y; v) = See ee 
T (2/2) Jur 
We also use the alternative notations t, (y) = tı (y;v) and T, (y) 
cerning the moments’, we obtain u(Y) = 0, oœ? (Y) = v/(v—2), (Y) = 0 and 
y (Y) = 6/(v — 4). 


II 
= 
ji 
= 
x 
< 
Q 
© 
P 


A.2.1.12 The log-normal distribution 


Let Z ~ N (u, 07) be a normal-distributed random variable. X = e” is a log-normal 
random variable and we note X ~ LN (u, 0°). The probability distribution function is 


equal to: 
1 (=) 
i ae 
jos erT 


LOVT 


13 The second moment is not defined if v < 2. 
14The skewness is not defined if v < 3 whereas the excess kurtosis is infinite if 2 < v < 4. 
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whereas the cumulative distribution function has the following expression: 
Ing— 
F(x) = (===) 
o 


We have: 


[X] = ett 80? 


and: PETIT 
var (X) = e™”+7 (e — 1) 


A.2.1.13 The Pareto distribution 
The Pareto distribution is denoted by P (a, x—). We have: 


and: 


P()=1-(4) © 


where x > x_, a > 0 and x_ > 0. Concerning the first two moments, we obtain: 


LX] = ar 
œa— 1 
if a > 1 and: 3 
ae 
(a— 1) (a — 2) 
if a > 2. 


Remark 213 The Pareto distribution: can be parameterized as follows; 


F(x) =1- Cor 


where x > 0, a>0 and @>0. In this case, it is denoted by P (a, 0). 


A.2.1.14 The generalized extreme value distribution 


The generalized extreme value distribution is denoted by GEV (u, 0, £). We have: 


fret) (oe). 
retmon(-(1+4(54)) “) 


where x > u — o/£, o > 0 and € > 0. Concerning the first two moments, we obtain: 


d-€)-1) 


and: 


on 


£ 


[X] = u+ 


if € < 1 and: 


var (X) = gy T (1 — 26) - T? (1—8) 


ifE < 4. 
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A.2.1.15 The generalized Pareto distribution 
The generalized Pareto distribution is denoted by GPD (o, €). We have: 


and: 


where x > 0, o > 0 and € > 0. Concerning the first two moments, we obtain: 


o 
Saa 
if € < 1 and: 
var (X) = = 
(1—8) (1 — 26) 
ifE< 4. 


A.2.1.16 The skew normal distribution 


The seminal work of Azzalini (1985) has led to a rich development on skew distributions 
with numerous forms, parameterizations and extensions!°®. We adopt here the construction 
of Azzalini and Dalla Valle (1996). 


The multivariate case Azzalini and Dalla Valle (1996) define the density function of 
the skew normal (or SN) distribution as follows: 


f (x) = 2dn (x — £ Q) O1 (n'om! (= é)) 


with w = diag’? (Q). We say that X follows a multivariate skew normal distribution with 
parameters £, Q and 7 and we write X ~ SN (€,0,7). We notice that the distribution of 
X ~ SN, (€,9,0) is the standard normal distribution M (£, Q). We verify the property 
X = £&+wY where Y ~ SN (0,C,7) and C = w!Qw7! is the correlation matrix of Q. 
Azzalini and Dalla Valle (1996) demonstrated that the first two moments are: 


EX] = €+ [2a 


cov (X) = w (c — 55") w! 


where 6 = (1+ n'Cn) 7 Cn. 
Azzalini and Capitanio (1999) showed that Y ~ SN (0,C, 7) has the following stochastic 
representation: 


yal V ifU>0 
~ ) —U otherwise 


where: 


(Preco coh © ) 


15See for instance Arellano-Valle and Genton (2005) and Lee and Wang (2013) for a review. 
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and 6 = (1+ n'Cn) Cn. We deduce that: 


Pr{X <a} Pr{Y <w! (x — £)} 
= Pr{U <w! (x-— £) | Uo > 0} 
Pr{U < w! (x — £), Uo > 0} 
Pr {Up > 0} 
= 2(Pr{U<w!(«—-}—Pr{U < w! (x—€),U < 0}) 
= 2 (p(w? (x — £); C) — Ong (u+; C4 (8))) 
= 2®n41 (uy; Cy (—8)) 
where u} = (0,w~!(a—€)). We can therefore use this representation to simulate the 
random vector X ~ SN (£, Q, n) and compute the cumulative distribution function. 
Let A be a m x n matrix and X ~ SN (€,,7). Azzalini and Capitanio (1999) demon- 
strated that the linear transformation of a skew normal vector is still a skew normal vector: 


AX ~ SN (£4, 24,1) 


where: 
Ea = AE 
Q4 = ANAT 
waz B'n 
Nna = 


(1+7 (C — BOZ BT) n)” 


with w = diag’? (Q), C =w7!Qu, wa = diag’? (Q4) and B = w~!QA". This property also 
implies that the marginal distributions of a subset of X is still a skew normal distribution. 


The univariate case When the dimension n is equal to 1, the density function of X ~ 


SN (E, w?, n) becomes: 
2 r—ėĖ r—ėĖ 
a eo 5 are) 
w w w 
Using the previous stochastic representation, we have: 
E) 
w w 
= 20, (o £a) 
w 


n 
ô= —— 
J/1t+n? 


We note mo = 6,/2/n. The moments of the univariate SN distribution are: 


Pr{X <a} 


Il 


where: 


H(X) = €+wmo 
o? (X) = w? (1— må) 

= 4-7 me 
Wee = ( 2 > 
p(X) = 2(r-3) — 
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A.2.1.17 The skew ¢ distribution 


The multivariate case Let X ~ SN (0,0,7) and V ~ x?/v be independent of X. 
Following Azzalini and Capitanio (2003), the mixture transformation Y = € +V~1/2X has 
a skew t distribution and we write Y ~ ST (€,0,7,v). The density function of Y is related 
to the multivariate t distribution as follows: 


f(y) =n (y—60)T (nw W- 9) fv +n) 


where Q = (y—£)' Q7! (y — £). We notice that we have: 


Pr{Y <y} = Pr{vPx <u (y-8)} 


= Pejy 1p < wT! (y — £) | Uo > i 


= amr) s(ung-0)) 


= 2 (Tn (w7! (y= 8); C, v) —Tr41 > A (8), v) 
= 2T 41 (u4; Cy (—ô) A v) 


where u+ = (0,w~!(y—&)). 

Like the multivariate skew normal distribution, the skew t distribution satisfies the 
closure property under linear transformation. Let A bea mxn matrix and Y ~ ST (£, Q, n). 
We have: 

AY ~ SN (E4, Q4, NA, VA) 


where: 
fa = AE 
Qa = ANAT 
waQ,'B'n 
VA = 


VA = 
with w = diag’? (Q), C = w7!Qu, wa = diag’? (Q4) and B = w~!QA". This property also 
implies that the marginal distributions of a subset of Y is still a skew t distribution. 


The univariate case The density function becomes: 


ra= ža (r) (n( 8) [ee +1) 


where Q = (y — 8 /w?. To compute the cumulative density function, we use the following 
result: 
£ 


Pr{Y < y} = 2T (o. L Sei v) 
wW 


Let mo and vo be two scalars defined as follows!® 


ae oy)“ exp (mr (45+) mr (3)) 


v = p3 %0 


16We recall that 6 = a/VvI +a. 
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As shown by Azzalini and Capitanio (2003), the moments of the univariate ST distribution 


WY) = €+wmo 
o’ (Y) = wv 
2:3 v (3-8? 3v 
n(Y) = mow af oe v-2 a) 
3n? Amgv (3—8) 6mav 
yY = —2 0 0 4 
v (Y) rae (a v—3 Le amo 3 


A.2.1.18 The Wishart distribution 


Let X1,...,Xn be n independent Gaussian random vectors Mp (0, £). If we note X = 
(X1,...,Xn) the n x p matrix, then S = X'X is positive definite and follows a Wishart 
distribution Wp (£, n) with n degrees of freedom and covariance matrix X. Its probability 
density function is: 


jg =2=0/2 1 ( A ) 
f(S)= = exp ( trace (US ) 
272/2, (7/2) 2? : 
where T, (a) is the multivariate gamma function: 


jes 
Tp (a) = n?=1)/4 ID, Tp (0 4 ~~) 


We have E[S] = n= and var (Sij) =n (£2; + X; X; j). Here are the main properties: 


1. if A is a q x p matrix with rank q, then ASA' ~ W, (AXA', n); 
2. if È > 0, then E712 S12 ~ W, (Ip n); 
3. if S; are independent random matrices Wp (X, ni), then Sy", Si ~ Wp (X, Dpi ni); 


4. ifaisa p x 1 vector, we have: 


a 2 
Tyg ~X (n) 


5. S71 follows an inverse Wishart distribution W,' (X, n) and we have: 
es ape 
al S-ta 


~ x? (n-p+1) 


A.2.2 Special results 
A.2.2.1 Affine transformation of random vectors 


The univariate case Let X be a random variable with probability distribution F. We 
consider the affine transformation Y = a+bX. If b > 0, the cumulative distribution function 
H of Y is: 


H(y) = Pr{Y <y} 


II 
go 
H 
——s 
Sa 
IA 
Ned 
=| | 
Q 
Rea 


| 
rr 
PEE E 
Neg 
œj | 
Q 
Soy 
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and its density function is: 


If b < 0, we obtain: 


x 
3 
l 


Pr{Y < y} 


y—a 
= Pr X> 
sae 


and: 


h(y) = OH u=- (55) 


The mean and the variance of Y are respectively equal to a+b- u (X) and b? - var (X). The 
centered moments are: 


[Y — w(¥))"] = 8" -E [(X - p(X)" 


We deduce that the excess kurtosis of Y is the same as for X whereas the skewness is equal 
to: 


qı (Y) = sign (b) -71 (X) 


As an illustration, we consider the random variable Y = u + oX with X ~ N (u, o?) 


and o > 0. We obtain: 
H (y) = ð (=+) 


o 


1 1 (yp? 
h(y) = ex 
w) m- E) 


H! (a) = u + o7! (a) 
For the moments, we obtain u (Y) = u, 0? (Y) = 07, %1 (Y) = 0 and y (Y) =0. 


and 


We also deduce that 


The multivariate case Let X be a random vector of dimension n, A a m x 1 vector and 
B a mxn matrix. We consider the affine transformation Y = A+ BX. The moments verify 
u(Y) = A+ Bu(X) and cov (Y) = Bcov(X)B'. In the general case, it is not possible 
to find the distribution of Y. However, if X ~ N (u, X), Y is also a normal random vector 
with Y ~ N (A+ Bu, BXB"). 


A.2.2.2 Change of variables 


Let X be a random variable, whose probability density function is f (x). We consider 
the change of variable Y = ọ (X). If the function y is monotonic, the probability density 
function g (y) of Y is equal to: 
da 


g (y) = f (x) a 
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In the multivariate case, we note (X1,...,X,) the random vector with density function 
f (@1,.--,%n). If the function g is bijective, we can show that the probability density func- 
tion of (Y1,..., Yn) = p(X1,..., Xn) is equal to: 


1 


G(Y1,-++5 Yn) = f (@1,.+-, En) dete 


where J, is the Jacobian associated to the change of variables. 


A.2.2.3 Relationship between density and quantile functions 


Let F (x) be a cumulative distribution function. The density function is f (x) = ôs F (x). 
We note a = F (x) and x = F~! (a). We have: 


eee (z)) _ sr) (52) 


We deduce that: 


a eo 


and: 


F) = 5 TF) 


For instance, we can use this result to compute the moments of the random variable X with 
the quantile function instead of the density function: 


z Ball a a’ f (x) a= f (E~! (a))” da 


—oco 


A.2.2.4 Conditional expectation in the case of the normal distribution 


Let us consider a Gaussian random vector defined as follows: 


X u x x 
ow (Cl) Cas as )) 
( Y ) Hy Zy Yyy 
The conditional probability distribution of Y given X = x is a multivariate normal distri- 
bution. We have: 


Hyj = u [Y | X = x] 
Hy + ps2 as (x z Hx) 


and: 


= o [Y|X= x] 


1 1 liy 
Ly, — Lye Ug Ue,y 


yyl 


We deduce that: 
Y = py + Dye2 ze (2 — Hs) t U 


where u is a centered Gaussian random variable with variance go? = Xy yje: It follows that: 
Y = (fly — Ey sz He) + Ey sbrt +U 
esr (NS 


Bo BT 
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We recognize the linear regression of Y on a constant and a set of exogenous variables X: 
Y=—o+h'X+u 


Moreover, we have: 


2 


o 
Tat 
Zy,y 

= pe ae 
Zy,y 


A.2.2.5 Calculation of a useful integral function in credit risk models 


We consider the following integral: 


S ® (a+ bx) (x) dx 


—co 


We have: 
c 1 a+ba 1 5 
i= I =| exp | -39 dy | 4 (x) dx 
c a+ba 2 2 
= | L exp (-2 = ) dy dx 
T J—co Joo 


By considering the change of variables (x, z) = ọ (x,y) such that z = y — bx, we obtain!’: 


1 c a 2 2 b2 2 2 
i zl J exp AT TA dz dz 
2T J æJ- 2 


If we consider the new change of variable t = (1 + b2)” ? z and use the notation ô = 1 +b?, 
we have: 


V6 5 af EES ôt? + 2b ôtx + bx 
I = om exp 5 dtdx 
T Joo J —0o 


Vi [° [ZIFF öfa, ee 


We recognize the expression of the cumulative bivariate normal distribution!*, whose cor- 
relation parameter p is equal to —b/ vô: 


I & (a + bx) d (2) dr = 02 ( 


—co 


a —b 
VIFO =) 


17We use the fact that the Jacobian of ọ (x,y) has the following expression: 


1 0 
ar 
and its determinant |Jọ| is equal to 1. 
18We recall that ®2 (x, y; p) is the cumulative distribution function of the bivariate Gaussian vector (X, Y) 
with correlation p on the space [—oo, x] x [—oo, y]. 
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A.3 Stochastic analysis 


In what follows, we recall the main results of stochastic analysis related to Brownian 
motions and stochastic differential equations. Most of them can be found in Gikhman and 
Skorokhod (1972), Liptser and Shiryaev (1974), Friedman (1975), Karatzas and Shreve 
(1991) and Øksendal (2010). But before that, we introduce some definitions and notations. 


The probability space is denoted by (Q, F,P) where Q is the sample space, F is the 
o-algebra representing the collection of all events and P is the probability measure. 


A random (or stochastic) process X = {X (t) : t € T} is a collection of random vari- 
ables X (t) where T = [0, 00) is the index set. 


A filtration F, on the probability space (Q,7,P) is an increasing sequence of o- 
algebras included in F: 
Fee Fee F Vt>s 


The filtration represents the time evolution of the information produced by the 
stochastic process X. 


The random process X is F,;-adapted if X (t) is F,measurable for all fixed t € T, 
meaning that the value of X (t) depends only on F;. In other words, the value of X (t) 
cannot depend on unknown future data. 


The random process X is a martingale with respect to the filtration F; if E |X (t)| < co 
(or E |X? (t)| < oo) and E[X (t)| F,] = x, where zs is the realization of X (s). 


The random process X is stationary if: 
P{X (s) € A} = P{X (t) € A} V (s,t) ET? 


It is weak-sense stationary if the first moment and the autocovariance do not vary 
with respect to time: 


a [x (1)"| < œ 
[X (s)] = E[X (¢)] 
[X (s) X ©] =E[X (s + u) X (t+ u) 


where u > 0. In the case where X is a Gaussian random process, the two definitions 
are equivalent. 


The stochastic process X is Markov if the probability distribution of X (t) condition- 
ally to the filtration Fs is equal to the probability distribution of X (t) conditionally 
to the realization zs: 


P{X (t) € A| F,} = P{X (t) € A| X (5) =2,} 


This implies that we don’t need to know all the information before s, but only the 
last value taken by the process. 


The random process X is continuous at time t if, for all € > 0, 
lim P {|X (¢) — X(s)|>e}=0 
>s 


X is said to be a continuous stochastic process on T if X is continuous for any fixed 
tET. 
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A.3.1 Brownian motion and Wiener process 
The stochastic process B = {B (t) : t € T} is a Brownian motion if: 
1. B (0) = 0; 


2. for all partition 0 = tọ < tı < tg < --- < tn, the random variables B (t;) — B (t;_1) 
are independent; 


3. B(t) — B(s) is normally distributed with E[B(t)— B(s) = w(t—s) and 
: (BH — B(s))"] = 0? @- $) for t > s. 


If u = 0 and o = 1, we obtain the standard Brownian motion. It corresponds to the Wiener 
process’? and we denote it by W (t). Here, we list the four main properties of W (t): 


1. E[W (t)] = 0; 


2. cov (W (s) W (t)) = E[W (s) W (£)] = min (s, t); 


3. W (t) is a martingale; 
4. the process W is continuous. 


Notice that Wiener paths are not differentiable (Friedman, 1975), meaning that 0,W (t) 
has no sense. We can also show that the Wiener process is invariant in law under various 


transformation: c~!/?W (ct) Ew (t) if c > 0 (rescaling), tW (t~') Ew (t) (inversion) and 
W (1) — W (1 — t) £ W(t) if t € [0,1] (time reversibility). 

The multidimensional Wiener process W (t) = (W1 (t),..., Wn (t)) satisfies the following 
properties: 


e each component W; (t) is a Brownian motion; 


e the different components are correlated: 


2 [W; (s) W; (t)] = pi,j min (s, t) 


We note p the correlation matrix of W (t): 


|W (wo = pt 


It implies that the density function of W (t) is the multivariate normal pdf: 


Om)? | p712 1 E 
E ) Pá exp ( g’ te) 


Remark 214 Let W (t) be a multivariate Wiener process with correlation matrix p. We 
have: 


W (t) = AW* (t) 


where AA! = p and W* (t) is a multivariate independent Wiener process. 


19From a historical point of view, the Brownian motion and the Wiener process were derived in a different 
manner. Today, the two terms are equivalent. 
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A.3.2 Stochastic integral 


Let f (t) be a stochastic function that defines the stochastic process”? X. We assume that 
f (t) is a random step function on [a,b] and we denote by A = {a= to < tı < ++: < tn =b} 
the associated partition. We have f (t) = f (ti) if t € [ti, ti41[. We introduce the notation: 


b n—1 
J fo aw => EE) W tan) = WD) 
a i=0 
f? f (t) dW (t) is called the stochastic integral of the random process X = {f (t), t > 0} on 


(a, b] with respect to the Wiener process. If f (t) is non-random, we have an integration by 
parts (IPP) formula: 


b b 
f| foaw@=10WO-1@w@- | F (t) W (t) dt 


We deduce that: : 
1 dW (t) = W (b) — W (a) 


In the case of a general function f (t), the stochastic integral is defined as the limit in 


probability of the Riemann sum?!: 


b n-1 
ff aw@ = lim Y FE W tan) -W E) 
a i=0 
Like the Riemann-Stieltjes integral, it satisfies the linearity property: 


b b b 
J (f+ 39) aw =a f ra awe+s f a(t) aW (2) 


and the Chasles decomposition: 


c b c 
f rowo=f rowa+ f f(t) aw (t) 


where (a, 8) € R? and a < b < c. We also have: 


|f towo] =o 


Another important result is the Itô isometry that is useful for computing the variance of 


the stochastic integral: 
b á b 
(/ rowo) = l) Poal 


More generally, we have: 


b b b 
lf ro awe f nip awa] - |y fOO a 


20This is equivalent that X = {X (t) = f (t) ,t > 0} is a stochastic process. 
21 This construction is valid only if the random process X is F¿-adapted. 
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Remark 215 If f(t) is a non-random function, then f? f (t) dW (t) is a Gaussian random 


variable, whose mean is zero and variance is equal to f? f? (t) dt. 


The Itô integral is a special case of the stochastic integral: 


T(t) = f(s) aW (s) 


It is also called the ‘indefinite integral’. An important property is that any It6 integral is a 
martingale: 


5 [I (t)| Fs] = I (s) 


A related result is the martingale representation theorem. Assuming that the filtration F 
is generated by a Wiener process, the theorem states that any F;-martingale M (t) can be 
written as an It6 integral: 


M(t) = si O+ f f(s) aW (s) 


A.3.3 Stochastic differential equation and It6’s lemma 


An It6 process is an adapted stochastic process that can be expressed as follows: 


t t 
X (t)= xo)+ f u (s) as+ f o (s) dW (s) 
0 0 
The stochastic differential equation (SDE) of X (t) is: 
dX (t) = u (t) dt + o (t) dW (t) 


The conditional process dX (t) with respect to F, is Gaussian with mean u(t) dt and 
variance o? (t) dt. u (t) is called the drift coefficient, while ø (t) is the diffusion coefficient. 


A.3.3.1 Existence and uniqueness of a stochastic differential equation 


Let u (t,x) and ø (t,x) be two measurable functions where (t,x) € T x R. If X (t) isa 
random process such that: 


{ dX (t) = w(t, X (t)) dt + o (t, X (t)) dW (t) (A.21a) 
X (0) = x (A.21b) 
we say that X (t) satisfies the stochastic differential equation (A.21a) with the initial con- 
dition (A.21b). 


Friedman (1975) showed that the system (A.21) has a unique solution if there exist two 
scalar Kı and Kə such that V (x,y) € R?, we verify the following inequalities: 


{ lu (t,x) —w(t,y)| < Kile- yl 
|o (t,x) — o (t, y)| < Kı |z — yl 


and 
{ Iu (t,2)| < Ko (1+ |æl) 
lo (t, x)| < K2 (1+ |z|) 


The previous theorem is not the unique way to show the existence of a solution. For instance, 
a variant of this theorem is given by Karatzas and Shreve (1991). If there exist a scalar 
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K € R that verifies the inequality |u (t,x) — w(t,y)| < K |x —y| for all (x,y) € R?, and 
a eee increasing function h : R} —> R, that satisfies the conditions”? h(0) = 0 and 
Jo h>? (u) du = œ for all e > 0 such that |ø (t, £) — o (t, y)| = h (|æ — y|), then the solution 
of the ae exists and is unique. 


A.3.3.2 Relationship with diffusion processes 

A Markov process X (t) is called a diffusion process if the transition probability func- 
tion?’ p(s,x;t, A) satisfies the two following properties: 

1. For alle > 0, t € T and x € R, we have: 


. 1 
lim z eee =o 


where A= {y E€ R: |z — y| > e}. 
2. For all € > 0, t € [0,7] and a € R, there exist two functions a (t,x) and b(t, x) such 
that: 


mgh w — x) p (t,x; t + h, dy) = a (t,x) 


and: 


imz fy — x) p(t,x;t+ h,dy) = b(t, x) 


where A = {y E R: |z — y| < e}. 
This definition is given by Gikhman and Skorokhod (1972). They also showed that the 


unique solution of the SDE (A.21) is a diffusion process with a (t,x) = u (t,x) and b (t, x) 
= 0° (t,x). 


A.3.3.3 Itô calculus 


To find the explicit solution of a SDE, we can use Itô calculus. We consider the following 
differential: 
dX (t) = u (t, X (t)) dt + o (t, X (t)) dW (t) 


Let f (t,x) be a C? function. The stochastic differential equation of Y (t) = f (t, X (t)) is 
equal to: 
dY (t) = df(t, X (t)) 
Of 


N 
SE 


22For instance, we can take h (u) = u® where a > 4. 
23 The transition probability function of the Markov process X (t) is defined as: 


p(s,z;t, A) =P(X(t)E€A|X(s)= x) 
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The previous result is called the It6 formula. It can be viewed as Taylor series with the 
following Itô rules: dt - dt = 0, dt - dW (t) = 0 and dW (t) - dW (t) = dt. 
Remark 216 In compact form, we have: 


a OF. isn ë a 
dy = (Z +ro3+ je o2) dt + APTO aw 


= (52 + Auf) dt + ar dw 


where Af is the infinitesimal generator of X: 


82 
Af a5 Fetus 


In the case where X (t) = W (t), we obtain: 


df (t, W (t) = & f (t,W (£) dt + Lo F(E, W (t)) dt + ô f (t, W(t) AW (t) 


If we now consider two stochastic processes X; (t) and Xə (t) that depend on the same 
Wiener process: 


dXı (t) = ui (t, Xı (t)) dt TOL (t, Xi (t)) dw (t) 
{ dXə (t) = [2 (t, Xo (t)) dt +02 (t, Xə (t)) dW (t) 
the Itô formula becomes: 
d (Xa (t) X2 (t)) = Xı (t) dXə (t) + Xo (t) dXı (t) + d4X;ı (t) -d X2 (t) 
where: 
AX) (t) : dX (t) = o1 (t, Xı (t)) o2 (t, Xo (t)) dt 
A.3.3.4 Extension to the multidimensional case 


Let X (t) = (X1 (t),..., Xm (t)) be a random vector process. We consider the functions 
L:T xR” — R” and o : T x R™ — R™*". The multidimensional SDE is defined as: 


{ dX (t) = w(t, X (t)) dt + o (t, X (t)) dW (t) 
X (0) = To 


where zo is a vector of dimension m. The Itô formula applied to Y (t) = f (t, X (t)) is: 
dY (t) = df(t, X (t)) 


= < (t, X (t)) SO dX (t) + 


J42 (t, X (t)) dX (t) 


We finally obtain?4: 
_ (of Of F 
ara = (lexo ex nex) + 


trace G tX) 2i (t, X (t)) o (t, X (t)) »)) dt + 
Í 


ax 


Q Nie aN 


(t, X (£) o (t, X(t) dW (t) 


24The Itô rules are dt - dt = 0, dt - dW (t) = 0 and dW (t) - dW (t)' = pdt. 
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If we apply the Itô formula to Y (t) = X: (t) X2 (t), we obtain: 


dY (t) = p (t, Xı (t)) X2 (t) dt + pro (t, X2 (t)) Xi (t) dt + 
p1,201 (t, Xı (t)) o2 (t, Xo (t)) + 
o1 (t, Xı (t)) X2 (t) AW: (t) + o2 (t, X2 (t)) Xı (t) dW2 (t) 
= X(t) dXə (t) + Xe (t) dXı (t) + dX) (t) - dX2 (t) 
where: 
AX (t) - dX (t) = p1,201 (t, Xı (t)) o2 (t, Xe (t)) dt 
In the case where W1 (t) = W2 (t) = W (t), the correlation p1,2 is equal to one and we 
retrieve the previous result. 


Using the previous framework, we also deduce that the integration by parts formula 
becomes: 
b 
f Xı(t)dX2 (t) = Xı (b) Xə (b) — Xi (a) Xə (a) — 


b b 
f xoana- f ax H ax) 


In the case where X; (t) = f (t) is a non-random process and Xə (t) = W (t), we retrieve 
the classical IPP: 


II 
SS 
a 
= 

= 
alas 
= 

l 
SY 
— 
& 

= 
— 
& 

l 


b 
f I (t)aw (0) 


because f? f' (t) dt dW (t) =0. 
Remark 217 dX; (t) -dX2(t) is also called the quadratic variation and we note: 
AX) (t) -dX2 (t) = (X1 ©) , X2 (©) 


Using the notation (X: (t)) = (Xı (t) , Xı (t)), the quadratic variation satisfies the bilinearity 
property and the polarization identity: 


(Xa (t) + X2 (t)) — (X (t)) — (X2 (t)) 
2 


(Xı (t) , X2 (t)) = 


A.3.4 Feynman-Kac formula 
We consider the state variable X (t) defined by: 
dX (t) = u (t, X (t)) dt + o (t, X (t)) dW (t) 
and AV the infinitesimal generator of the diffusion process: 


1 ƏV (t, 
AV (t,£) = a” (t, x) eves) + p(t, 2) =o 


Under the following assumptions: 
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1. w(t, x), o(t,x), g (t,x) and h (t,x) are Lipschitz and bounded on [0, T] x R; 
2. f (x) is a continuous function of class C?; 


3. f (x) and g(x) grows exponentially”. 


the solution of the Cauchy problem: 


—OV (t,x) +h(t,x) V (t,£) = AV (t,£) +g (t,x) 
TOn ee 
is unique and given by: 
T 
V (t,£) =E | 8 (T) f(X (T)) + l b (s) g (s, X (s)) ds| X (t) = | (A.23) 


where: 
B(s) = exp (= f hlu, X w) du) 


The Feynman-Kac formula states that the solution of the parabolic PDE (A.22) can be 
found by calculating the conditional expectation (A.23). 
In the case where h (t,x) = g (t,x) = 0, we obtain the backward Chapman-Kolmogorov 
equation: 
ƏV (txr) 1 5 
t == — tp —— n 
aV ( ,£) u (t,x) Ox 37 ( , 2) A x2 


where V (T, 2) = f (x) and: 


V (t,£) =E[F (X(T) X (@) = a] 


If f (x)= 1{x < x7}, we obtain the probability distribution: 


V (t,£) = E[1{X (T) < er} X (t) =a] 
= P{X(T)<a7|X(t) =2} 


To obtain the density function, we set f (x) = 1 {x = zr} and we have: 
V (t,£) = P{X (T) = 27| X(t) = 2} 


Remark 218 The Feynman-Kac formula is valid in the multivariate case by considering 
the following infinitesimal generator of the diffusion process: 


AV (t,x) = 5 traco G t, X A)T ai 6X0 6X WP) + 


5, eX (0) w(t, X (8) 


25This implies that there exist two scalars K > 0 and £ > 0 such that |f (x)| < Ket?’ 
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A.3.5 Girsanov theorem 


Let W be a Wiener process on the probability space {0, F, P}. If the process g (t) satisfies 


the Novikov condition: : 
1 
z lexp GI 7’ (s) as) < 00 
2 Jo 


then W® defined by W2 (t) = W (t) — a g(s) ds is a Wiener process on the probability 
space {0, F,Q}. The change of measure is given by the Radon-Nikodym derivative: 


dQ 
dP 


= Mit) 


x en (fot) awis)-5 f #0) as) 


Moreover, M (t) is an F;-martingale. 
Remark 219 If we consider the state variable X (t) defined by: 
dx (t) = w(t, X (t)) dt + a(t, X (t)) dW (t) 


the Girsanov theorem states that the change of measure is equivalent to change the drift of 
the diffusion: 
AX (t) = (u(t, X (t)) + g(t) dt + o (t, X (t)) dW® (t) 


The Girsanov theorem can be extended to the multidimensional Wiener process. In this 
case, g(t) and W® (t) = W (t) — fo g(s) ds are two vector processes and we have: 


M (t) = exp (f ais) awo- f 916)" 918) as) 


A.3.6 Fokker-Planck equation 


With the backward Chapman-Kolmogorov equation, we can compute the probability of 
the event {X (T) = zr} conditionally to X (t) = x. From a numerical point of view, this 
approach is generally not efficient because we need to solve one PDE for each value of xr. 
Another way to compute this probability is to consider the forward Chapman-Kolmogorov 
equation: 

{ aU (t,£) = —O, [u (t, £) U (t, x)| + 482 [o? (t,x) U (t, x)| 
U (s,x)=1{x = zs} 


where s < t. This PDE is known as the Fokker-Planck equation and its solution is: 
U (t,£) = P{X (t) = x7 |X (s) = zs} 
p (S, Ts; t, 2) 


II 


In particular, we can calculate the density function p (0, £o; T, £r). 


In the multidimensional case, the Fokker-Planck equation becomes: 


OU (t, £1,..., Lm) = -X de, [hi (t, £1,- --,Em) U (t, £1,- - -3 Em )] + 


1 m m 
5 Ses ae) Pao aU Oe nea) 
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where: 


n n 


Dig (t, Ti; sEm) = 5 x Pky ,k29%%1,k1 (t, Ti;... tin) Oj ko (t, £1, tse Ëm) 
ky=1 ko=1 


In the diagonal bivariate case: 


we obtain: 
OU (t,21,%2) = —Oz, [#1 (t, £1, £2) U (t, 21, £2)] — 
On, [u2 (t, £1, £2) U (t, £1, £2)] + 
1 
5 Oe [of (t, 01,22) U (t, £1, £2)] + 


1 
5 Ons [o3 (t, £1, £2) U (t, Ti, z2)| T 


P1,202, xs [01 (t, £1, £2) 02 (t, £1, £2) U (t, £1, £2)| 


A.3.7 Reflection principle and stopping times 


A nonnegative random variable 7 is a stopping time with respect to the stochastic process 
W if the event {r < t} depends on W (s) for s < t. This implies that {7 < t} € F, meaning 
that the event cannot depend on the future path of the stochastic process. A particular case 
of stopping times is a hitting time. Let 7, = inf {t : W (t) = x} denote the first time when 
the Brownian motion hits the value x > 0. We can show that the hitting time 7, is also a 
stopping time, and satisfies the strong Markov property: 


W (rr +t) -W (tz) =W (r +t) -rE W(t) 


Therefore, W (7, + t) — W (Tz) is a Brownian motion that is independent from W (s) for 
S < Tz. This result generalizes the independent increments property when 7, is not a fixed 
time, but a random time. Let us define W (t) as follows: 


~ f Wit) ift <T, 
vO { 2x — W (t) otherwise 


The reflection principle states that W (t) is also a Brownian motion. Then, we can show 
that”®: 


Pr{t, <t} = 2Pr{W (t) > 


26We deduce that the density function of Tz is equal to: 


FO = Pr{te <t} 
- aaa) 
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Let M (t) = sup,<;W (s) be the maximum of a Brownian motion. We have: 
{M (t) > a} © {Te < t} 
It follows that: 


Pr {M (t) > x} 


II 
gs 
H 
m 
a! 
A 
~œ 
= 


Il 

N 
A 

= 

| 

© 
ETN 
ale 
Se 
SK 


Another important result is the joint distribution of (M (t), W (t)), which is given by: 
2 


Pr{M (t) >a,W(t)<y} = Pr{w( 

O _ 2% —y 

= 1-0(77) 
f 


where x > 0 and y < z. It follows that the joint density of (M (t), W (t)) is equal to: 


(Qa—y) [2 (Qa — y)? 
f(x,y) = 13/2 \[Zex ( y ) 


If we consider the Brownian motion with drift: 


X (t) = pt + W (t) 


the distribution of M (t) = sup,<; X (s) becomes: 


Pr{M (t) <2} = (z =) — PH®H (=£) (A.24) 


whereas the joint density of (M (t) , X (t)) is equal to: 
_ Qe-y) [2 Lo, (2e—y)" 
f(y) = ~a 7 exP | Hy — get 7 


A.3.8 Some diffusion processes 


A.3.8.1 Geometric Brownian motion 


It is the solution of the following SDE: 


{ dX (t) = uX (t) dt + oX (t) dW (t) 
X (0) = To 


In order to find the explicit solution, we apply the Itô’s lemma to the stochastic process 
Y (t) = ln X (t) and we have”: 


dY (t) = ( gX ® Tae exw) dt + zg X ® dW (t) 


= (u- 57°) dt + a0 dW (t) 


27We have f (t,x) = Ina, Of (t,£) =0, Oxf (t,£) = 27! and 6? f (t, x) = —xr7?. 
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where Y (0) = ln zo. We deduce that Y (t) is a Gaussian random process”®: 


Y(t) = maot f (1-50) ass f caw) 


1 
= loro (1 507) tow () 


It follows that: 


or: 


1 
X(t) = exp (tn. (u 50) H ow (t)) 
= age(#- 3 )trow) 


We obtain a log-normal random process, whose first moments are: 


[X(t] = exp (1m Lo + (u = 57°) t+ 50”) 
= gett 
and: 
var (X(t) = ¢2nso+2(u—ho?)t+o%t e) 
= me (e"" — 1) 
A.3.8.2 Ornstein-Uhlenbeck process 
We consider the SDE: 


{ dX (t) = a (b — X (t)) dt + o AW (t) 
X (0) = Tü 


where a > 0. We notice that E [dX (t) | F] = a (b — X (t)) dt. It follows that: 


e if X (t) < b, then E [dX (t) | F] > 0 implying that E [X (t) + dX (t) | Fe, > X (0); 


o if X (t) > b, then E [dX (t) | FJ < 0 implying that E [X (t) + dX (t) | Fi] < X(t). 


The coefficient b is the mean-reversion parameter (or the long-term mean) whereas a is the 
speed of reversion. If we apply the Itô’s lemma to Y (t) = (b — X (t)) e%, we obtain’: 


dY (t) = (a(b—X (t))e™ —a(b—X (t))e™) dt — eo dW (t) 
—oe™ dW (t) 
and: ; 
Y (t) = b—ay—a f e“ dW (s) 
28We have: 


Y(t\h~N (imzo + (n — 57°) t; o*t) 


29We have f (t,x) = (b— 2x) e%, Orf (t,£) =af (t,x), Oxf (t,x) = —e™ and 0? f (t,x) = 0. 
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Therefore, Y (t) is a Gaussian random process, because i e° dW (s) is a Gaussian random 
variable. We deduce that: 


X(t) = b—-e “Y (t) 


II 


t 
e “ag +b (1— e7%) + o f e “t-5) dW (s) 
0 


X (t) is also a Gaussian random process, whose first two moments are: 


l [X (t)] = eazy +b (1 — e7“) 


and: 


var (X (t)) = IG f erata ds! 


—zalt—s t 
= 7 | ai | 
2a 0 


== eaa _ p` 2at 
= a 


N 


A.3.8.3 Cox-IĪngersoll-Ross process 
The CIR process is the solution of the SDE: 
dX (t) =a(b— X (t))dt+0oy X (t) dW (t) 
X (0) = Xo 


where a > 0. It can be viewed as a modified Ornstein-Uhlenbeck process where the diffusion 
coefficient is o,/X (t). This implies that the CIR process is positive, and explains that 
this process is frequently used for interest rate modeling. If we apply the Itd’s lemma to 
Y (t) = (b — X (t)) e™, we obtain: 
dY (t) = (a (b— X (t))e* —a(b— X (t)) e“) dt — e*oy/X (t) dW (t) 
= -oe%,/X (t) dW (t) 


and: 
X (t) b—e “Y (t) 


= ety +b(1- e) +0 ic e (9 /X (5) AW (s) (A.25) 


II 


We can show that: 


here: 
where ia 


(1 — e74) o? 
and y? (¢) is the noncentral chi-squared random variable where v = 4abo™? is the number 


of degrees of freedom and ¢ = cape“ is the noncentrality parameter. It follows that the 
probability density function of X (t) is equal to: 


C= 


f(x) =cf (cx: 1, cupet) 
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where f (y;v,¢) is the probability density function of the noncentral chi-squared random 
variable x2 (¢). Using Equation (A.25), we can also show that: 


i [X ()] =e ao +b (1 — a) 


and: 


var (X (t)) = 1 J ce) as 


hes 

— o? 5 (£o — b) e@ + be? ds 
0 
in 


= gre 2a To — b) ess b 2as ‘ 
= e 
Ta 0 
= gre 2 (zo =b) ett b e2at (xo — b) b 
2a a 2a 
a?b —a —2a 
= — (1 — 2e7% + e7?) 


A.3.8.4 Multidimensional processes 


The multidimensional geometric Brownian motion is defined as: 


oc j (t) dt+0,X, (t) dW; (t) for j=1,...,n 
X (0) = zo 


where X (t) = (X1(#),...,Xp(t)) and W(t) = (W: (t),..., Wn (t)) is a n-dimensional 
Brownian motion with E [w (t)W o] = pt. The solution of the multidimensional SDE 
is a multivariate log-normal process with: 


X; (0) =X) 0) -exp (m ~ 50?) #4 051% (0) 


where W (t) ~ Nn (0, pt). 
Other multivariate stochastic processes are not very useful in finance, except stochastic 
volatility models. For instance, the Heston model is defined as follows: 


dXı (t) =uXı(t t) dt + V Xo (t \X1 (t ) dW, (t 
oe )) dt+0oyX2(t ane 


where E [W, (t) W2 (t)] = pt. Therefore, the process Xj (t) is a geometric Brownian motion 
with a stochastic volatility ø (t) = Xə (t) and the stochastic variance Xə (t) is a CIR 
process. Another related process is the SABR model: 


{ dX, (t) = Xə (t) XP (t) dW, (t) 
dXə (t) = VXo (t) dW (t) 


where E[W, (t) W2 (t)] = pt and 6 € [0,1]. We notice that the stochastic volatility is a 
geometric Brownian motion and there are two special cases: X4 (t) is a log-normal process 
(8 = 1) or a normal process (8 = 0). 
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A.4 Exercises 
A.4.1 Discrete-time random process 


1. We consider the discrete-time random process X; defined by: 
Xt = Xii tE: 


where Xo = 0 and ez is an iid random process. 


(a) We assume that ¢; is a Bernoulli random variable B (p). Give the filtration F; 


for t = 0,1,2. 
(b) We assume that €, ~ M (0,07). Show that X; is a martingale. 


2. We consider the AR(1) process: 
Xt = OXt-1 + Et 
where || < 1 and e is an iid random process with e4 ~ N (0,07). 


Show that X; is a weak-sense stationary process. 


Show that X; is a strong-sense stationary process. 


(a 
(b 
(c 
(d 


Is X; a Markov process? 
Same questions with the MA(1) process: 


wo RE RE ee 


Xt = ct + Oet-1 


A.4.2 Properties of Brownian motion 


We consider the standard Brownian motion W (t). 


1. Demonstrate the three properties: 


(a) E[W (t)] = 0; 

(b) cov (W (s) W (€)) = min (s, t); 

(c) W (t) is a martingale. 
2. Show that W (t) is continuous. 
3. Calculate E[W?(t)], E[W? (t)| Fs], E[W3()], E[W*(d)], E[e”®] 
2 [eV | F]. 


4. Calculate the mathematical expectation of W” (t) for n € N*. 


A.4.3 Stochastic integral for random step functions 


We assume that f (t) and g(t) are two random step functions on [a, }]: 


{ fa) = f(t) if t € [t;, tear 


where to = a and tn = b. 


and 
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. Demonstrate the linearity property and Chasles decomposition of the stochastic inte- 


gral f? f(t) dW (t). 


. Show that: 


and: 


. Deduce that: 


A.4.4 Power of Brownian motion 


Let W (t) be a standard Brownian motion. 


1. 


oe © N Q 


. Deduce the solution of T (t) = f 


. Calculate the first two moments of the Itô integral In (t) = fi 


Show that: 
dW? (t) = dt + 2W (t) dW (t) 


t 


o W (s) dW (s) and calculate the first two moments 


of I (t). 


. Let n € N*. Show that: 


dw” (t) = sn (n — 1) W”? (t) dt + nW”! (t) dW (t) 


t 


t We (s) dW (s). 


. Calculate the first two moments of the stochastic process Jn (t) = W” (t). 
. Calculate the first moment of the random process Kn (t) = IH W” (s) ds. 
. Explain why it is difficult to calculate the second moment of K, (t). 

. Find the variance of K; (t), Kə (t) and Kz (t). 


. What is the relationship between In (t), Jn (t) and Kn (t)? 


A.4.5 Exponential of Brownian motion 


Let W (t) be a standard Brownian motion. 


1. 
2. 


3. 


Find the stochastic differential deW ©. 


Calculate the first two moments of X (t) = eW ®©, Y (t) = So e“(s) ds and Z(t) = 
fo eW©) dW (s). 


Deduce the correlation between Y (t) and Z (t). 
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A.4.6 Exponential martingales 

1. Show that X (t) =e” is a not martingale. 

2. Find the function m (t) such that M (t) = m (t) X (t) is a martingale. 


3. We assume that X (t) = g (t) is non-random. Let Y be a F;-adapted process with: 
1 
dY (t) = E (t) dt + g (t) dW (t) 


Find the solution of M (t) = eY ®©. Show that M (t) is a martingale. 


4. We now assume that X (t) = g(t) is random. How can we show that M (t) is a 
martingale? 


A.4.7 Existence of solutions to stochastic differential equations 
1. We consider the following SDE: 
dX (t) = (14+ X (t)) dt + 44W (t) 
Show that it has a unique solution. 
2. Let a, b and c be three scalars. Show that the following SDE has a unique solution: 
dX (t) =a(b— X (t)) dt + cX (t) dW (t) 


A.4.8 Itô calculus and stochastic integration 


1. Find the solution of: x) i 
t 
dX (t) = i raO 


2. Find the solution of”: 
dX (t) = X (t) dt + X? (t) dW (t) 


3. Find the stochastic differential of: 


4. Deduce the stochastic differential of Y (t) = (1 — t)~' X (t). Find the solution of Y (t). 


5. Let X (t) = f (t, W (t)). Using Itô’s lemma, find a necessary condition such that X (t) 
is an F;-martingale. 


6. Verify that the necessary condition is satisfied for the cubic martingale: 
X (t) = W? (t) — 3tW (t) 
and the quartic martingale: 
X (t) = W4 (t) — 6tW? (t) + 32? 


7. Show that X (t) = e’/? cos W (t) is a martingale. 


30Hint: use the transform function f (t, X (t)) = 1/X (0) — 1/X (¢). 
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A.4.9 Solving a PDE with the Feynman-Kac formula 


We assume that: 


{ dX (t) = dt + dW (t) 
X(0)=2 


Let V (t,x) be the solution of the following partial differential equation: 


1. 


l E EE E lev ERA a) 4 hoes 


V(5,2)=2 
Using Girsanov theorem, show that: 
dX (t) = 2dt+dZ(t) 


where Z (t) = W (t) — i ds is a Brownian motion. 


2. Compute E[X (5)| F,] and E[X (5)|G] where F; is the natural filtration and G; is 


the filtration generated by the Brownian motion Z (t). Find E [X (5)| Go]. 


3. Solve the PDE (A.26) and compute V (0,x). Check that the solution satisfies the 


PDE. 


4. What does the solution become when the terminal value V (5, x) of the PDE is equal 


to e”? Check that the solution satisfies the PDE. 


A.4.10 Fokker-Planck equation 


1. 


We consider the Ornstein-Uhlenbeck process: 
dx (t) = a(b—X (t)) dt + adW (t) 


How can we calculate the density function using the Feynman-Kac representation? 
Same question if we consider the Fokker-Planck equation. Solve numerically the two 
PDEs and draw the density function P {X (1) = x | X (0) = 0} when a = 1, b = 10% 
and o = 20%. 


. We consider the geometric Brownian motion: 


dX (t) = uX (t) dt + oX (t) dW (t) 


How can we calculate the density function using the Feynmac-Kac representation? 
Same question if we consider the Fokker-Planck equation. Solve numerically the two 
PDEs and draw the density function P {X (1) = x | X (0) = 100} when u = 10% and 
a = 20%. 


A.4.11 Dynamic strategy based on the current asset price 


We assume that the price process S (t) follows a diffusion process given by the following 


SDE: 


dS (t) = u (t, S (t)) dt + o (t, S (t)) dW (t) 


where S (0) = So. We consider a dynamic strategy V (t) that consists in being long or short 
on the asset S (t). We note n (t) the number of shares at time t and we assume that it only 
depends on the current asset price: 
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. Define dV (t). 
. We define the function F (S) as follows: 


Ss 
P(s)= f f (a) de 


where c is a constant. Find the stochastic differential of Y (t) = F (S (t)). 


. Deduce an expression of the terminal value V (T). 


. Show that V (T) is composed of two terms: 


V(T)=G(T)+C(T) 


where G (T) only depends on the initial and terminal values of S and C (T) depends 
on the trajectory of S. How to interpret these two terms? 


. We consider the stop-loss strategy: n (t) = 1 {S (t) > S,} where S, is the level of the 


stopt. Show that the option profile of this strategy is a long-only exposure to the 
asset plus a put option. What is the value of the option strike? What is the cost of 
this trading strategy? 


. We consider the stop-gain strategy: n(t) = 1{S(t) < S,} where S, is the level of 


the gain®*. What is the option profile of this strategy? Why the cost of this trading 
strategy is positive? 


. We assume the following reversal strategy: 


S, — S(t) 
S (t) 


n(t)=m 


where S, is the price target of the asset and m > 0 is the leverage. 


(a) Explain the rationale of this strategy. 
(b) Find the value of V (T). 


(c) We assume that the diffusion coefficient ø (t, S (t)) is equal to ø (t) S (t). Show 
that: 


C(T)= TS IV (T) 


where IV (T) is the integrated variance. 


(d) Explain why the vega of the strategy is positive. 


A.4.12 Strong Markov property and maximum of Brownian motion 


1. 


2. 


Let M (t) = sup,<; W (s) be the maximum of the Brownian motion. Show that: 


Pr{m( >2}=2(1-0(5)) 


Let x > 0 and y < a. Show that: 


Pr{W (t) > 2x — y} = Pr{M (t) > x,W (t) < y} 


31We assume that S (0) > Sy. 
32We assume that S (0) < Sy. 
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3. Calculate the joint density of (M (t), W (t)). 
4. We now consider the maximum Mx (t) of the process X (t): 
X (t) = ut +W (t) 
Using Girsanov theorem, find the joint distribution of (Mx (t), X (t)). 
5. Deduce the density function of Mx (t). 


6. Verify that the distribution F (x) of the maximum is given by: 


noosa (E) -eo (i 


A.4.13 Moments of the Cox-Ingersoll-Ross process 
We consider the CIR process defined by: 


oo X (t)) dt + 0./X (t) dW (t 
X (0) = zo 


We recall that X (t) is related to the noncentral chi-squared distribution. Indeed, we have: 
1 4ab oP 
X (t) = -Y (3S ere i 


where Y (v,¢) is a noncentral chi-squared random variable whose number of degrees of 
freedom is v and noncentrality parameter is Ç, and: 


4a 
(1 — e74) o? 
1. Calculate the mathematical expectation of X (t). 


2. Find the variance of X (t). 


cCc = 


3. Determine the skewness and excess kurtosis coefficients of X (t). 


A.4.14 Probability density function of Heston and SABR models 


1. We consider the Heston model: 


dX) (t )=uXı(t ) dt + \/ Xo (t )X1 (t ) dW, (t 
ee )) dt +a Xo (t ae 


where E [W, (t) W2 (t)] = pt. Write the Fokker-Planck equation. 
2. We consider the SABR model: 


{ dX, (t) = Xə (t) XP (t) dW, (t) 
AX (t) = vX (t) AW (t) 


where E [W, (t) W2 (t)] = pt. Write the Fokker-Planck equation. 


3. Solve numerically the Fokker-Planck equation for the Heston and SABR models. The 
parameters are the following: 
(Heston) u = 0, a = 2, b = 4%, o = 20% and p = —75%; 
(SABR) 6 = 1.00, v = 0.5 and p = —75%. 


The initial values are X; (0) = 1 and Xə (0) = 6%. Draw the bivariate probability 
density function for t = 1/2. 


1084 Handbook of Financial Risk Management 


A.4.15 Discrete dynamic programming 


We assume that: 
s(k+1) = g(k,s(k),c(k)) = s (k) 


and: 


f (k, 8 (k),e(k)) = say ms lh) — 7 (c(k) — s (k))? + Bc(k) + v/s (kyesin s) 


The terminal value f (K,s(K)) is equal to 2s (K) — 1. The state variable s(k) takes the 
values s; = 1+ (i — 1) /2 where i = 1,...,ng while the control variable c (k) takes the value 


vj = j where j = 1,..., nc. 

1. We set a = 0.02, 6 = 0.1 and y = 0.01. 
(a) Compute the matrices J and C when K = 5, ns = 4 and nc = 8. 
(b) Deduce the optimal value J (1,1). 
(c) How do you explain that c* (k) % 3? 

2. We set a = 0.02, 6 = 0.1 and y = 0.01. 
(a) Draw the values taken by J (k, s(k)) when K = 100, ns = 100 and nc = 25. 
(b) What is the optimal state at time k = 1? 


(c) Find the optimal control c* (k) when s (k) is equal to 3, 13 and 22. Comment on 
these results. 


A.4.16 Matrix computation 
1. We consider the following matrix: 
1.000 0.500 0.700 


A= į 0.500 0.900 0.200 
0.700 0.200 0.300 


(a 
(b 
(c 
(d 


Find the Schur decomposition. 
Calculate e^ and In A. 
How to compute cos A and sin A? Calculate cos? A + sin? A. 


Calculate A1/2. 


CF NS SS 


2. We consider the following covariance matrix: 


0.04000 0.01500 0.00200 —0.00600 

S- 0.01500 0.02250 —0.00375 —0.00750 
0.00200 —0.00375 0.00250 —0.00250 
—0.00600 —0.00750 —0.00250 0.01000 


Compute the nearest covariance matrix X. 


3. We consider the matrix B = C; (—50%). Find the nearest correlation matrix p(B). 
What do you observe? Generalize this result. 
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4. We consider the following matrix: 


1.0 
0.9 1.0 
C=] 0.5 0.6 1.0 
0.2 0.9 0.0 1.0 
0.9 0.0 0.9 0.0 1.0 


Compute the nearest correlation matrix p(C). 


Taylor & Francis 
Taylor & Francis Group 


http://taylorandfrancis.com 
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